Content uploaded by Sudip Mandal
Author content
All content in this area was uploaded by Sudip Mandal on Apr 05, 2015
Content may be subject to copyright.
Neural Network Based Gene Regulatory Network
Reconstruction
Sudip Mandal1, Goutam Saha2 and Rajat Kumar Pal3
1ECE Department, GIMT, Krishna Nagar, India, email: sudip.mandal007@gmail.com
2IT Department, NEHU, Shillong, India, email: dr_goutamsaha@yahoo.com
3CSE Department, University of Calcutta, Kolkata, India, email: pal.rajatk@gmail.com
Abstract—Gene Regulatory Networks (GRN) is used to model
the regulations in living organisms. Inferring genetic network
from different experimental high throughput biological data (like
microarray) is a challenging job for all researchers. In this paper,
Artificial Neural Network, which is a very effective soft
computing tool to learn and model the dynamics or dependencies
between genes, is used for reconstruction of small scale GRN
from the reduced microarray dataset of Lung Adenocarcinoma.
The significances of regulations of one gene to other genes of the
system are expressed by a weight matrix which is computed using
Perceptron based biologically significant weight updating method
by minimizing the error during learning. Based on the values of
elements of filtered weight matrix, a directed weighted graph can
be drawn successfully that denotes gene regulatory network.
Keywords—Gene Regulatory Network; Microarray data;
Neural Network.
I. INTRODUCTION
Cellular biology is now becoming a rising area for
researchers to carry out their research. Now a days, there is
plenty amount of biological data accessible because of different
advanced experimental technologies like microarray from
online database like National Center for Biotechnology
Information (NCBI) etc. [1]. To analyze and recover
informative knowledge from these data, efficient
computational techniques are required. These methods can help
us in knowing interactions carried out in organisms at genomic
levels through some mathematical model. Such gene to gene
communications are represented in terms of special network
known as Gene Regulatory Network (GRN). It is a graphical
representation in which nodes consist of genes or protein and
edges connecting them show regulatory relationships between
them. It is constructed by observing the behavior of genes and
their impacts on other genes at particular experimental
condition. This behavior is analyzed in terms of measuring the
values of gene expressions with the help Microarray
technology. Depending on the expression level of genes and
their interaction, the status of human organs can either be
normal or cancerous.
When Transcription Factors (specialized proteins) bind to
promoter region of DNA, it can modify the rate of protein
synthesis and that can results sudden changes in values of gene
expressions. Now, if the rate of protein synthesis decreases
then it is called as inhibition or down regulation or negative
regulation and if it increases then it is called as activation or
up-regulation or positive regulation. If gene g1 positively
regulates gene g2 then it mean there is increase in expression
level of g2 because of g1 and in negative regulation it decreases
expression level of g2. GRN is very helpful for analyzing the
effect of drugs on genes, finding interaction between genes and
genetic pathway for disease development. It also helps in
studying dynamics of specific gene under particular diseased or
experimental circumstances. It helps in studying diseases (in
this case Lungs Adenocarcinoma) caused due to genes.
Many types of models have been proposed for
reconstructing gene regulatory networks in biological systems,
including Boolean networks [2-3], linear weighting networks
[4], differential equations [5], and Bayesian Networks [6-8], S-
system [9-10], Fuzzy Set [11-12], Artificial & Recurrent
Neural Network [13] and Evolutionary Method [14-18] etc.
Boolean networks examine binary state transition matrices to
search patterns in gene expression. Every part of the network is
either on or off depending on whether a signal exceeds a pre-
determined threshold level. Generalized Logical Networks
permit the variables of Boolean networks to have more than
two values and utilize generalized Boolean functions to define
their relationship. Probabilistic Boolean Networks merge
several promising Boolean functions together, so that each one
makes a contribution to the prediction of a target gene. On the
other hand probabilistic model randomly choose one of these
promising predictors. A Linear weighting network has the
benefit of simplicity since they use simple weight matrices to
additively recombine the contributions of different regulatory
elements. A Bayesian networks probabilistic transitions
between network states assuming that there are no cycles in the
network. However cycles or loops are the most important
mechanism to ensure stability. A Dynamic Bayesian Networks
merge the features of Hidden Markov Models to include
feedback. When modeling GRNs with the S-System method,
the expression rates are defined by the difference of two
products of power-law functions, where the first denotes the
activation term and the second the degradation term of a gene
product. Fuzzy work models the interactions between genes in
gene regulatory pathways using fuzzy weights and clustering.
In neural network based model, genes are represented by nodes
input and output layer of neural network where weight matrix
between nodes represents regulatory relationship between
genes. Moreover, different evolutionary hybrid methods like
Recurrent Neural Network, Genetic Algorithm, Particle Swarm
optimization, Ant Colony Optimization and Bee Colony
optimization etc [19, 20] are also already proposed to infer
gene regulatory network. Taking into account of all advantages
of GRN, identifying GRN is still quite complicated and
fascinating task for researchers.
In this paper, Neural Network method is applied for
reconstruction genetic network. This model is used to calculate
978-1-4799-4445-3/15/$31.00 © 2015 IEEE
the change of expression rates that depends on the weighted
sum of multiple regulatory genes expression levels which are
prior available for this experiment. A more biologically
significant and improved weight updating formula is used for
modeling of Gene Regulatory Network using Perceptron
Learning rule to find more biologically plausible regulatory
network. It is assumed that all related changes in expression
value of each gene depend on the expression value of previous
stage and changes of them also. The corresponding weight
matrix denotes the level of regulation between genes. Based on
this weight matrix, a directed weighted graph can be drawn
which denotes the gene regulatory network. The proposed
model is discussed in the next section. The experimental GRN
& results for the microarray dataset of Lung Adenocarcinoma
are shown in Section-III. Conclusion is given Section IV
followed by references.
II. NEURAL NETWORK MODEL FOR GRN
Neural Network (NN) is a very effective soft computing
tool to learn the pattern from the raw input data similar to the
working principle of Neuron in nervous system of human body.
This model is biologically plausible and noise-resistant. It is
continuous in time and uses a transfer function to transfer the
inputs into a shape similar to that observed in natural processes.
In addition, its nonlinear characteristics provide information
about the principles of control, as well as about the natural
interactions of elements of the modeled system.
Let’s assume that G1, G2, ……., G(m-1) and G
m are m
number of specific responsible genes for the cancer and x1N ,
x2N , ………., x(m-1)N , xmN are the corresponding gene expression
values of those genes of normal cell (here we consider Lung
Adenocarcinoma as the test case). When the lungs is affected
by cancer (during Lung Adenocarcinoma), the gene expression
levels of those genes will be changed as x1C , x2C , ………., x(m-
1)C , xmC. Now we define two 1×m dimensional matrixes to
define expression values of genes corresponding to normal and
cancer which can be found from microarray dataset of Lung
Adenocarcinoma.
[XN]1×m = x1N , x2N , ………., x(m-1)N , xmN (1)
[Xc]1×m = x1C , x2C , ………., x(m-1)C , xmC (2)
Let [dXC]1×m is a matrix which denote the relative change in
the expression values due to cancer.
[dXC]1×m = (x1C - x1N) , (x2C – x2N) , ………., (x(m-1)C - x(m-1)N) ,
(xmC - xmN) (3)
For the reconstruction Gene Regulatory Network, we have
implemented Neural Network model based on Perceptron
learning algorithm which has one input and one output layer.
There is no hidden layer in this structure. In this context, we
have assumed that due to cancer the relative change of
expression value of each gene is a function of expression
values previous normal stages of all genes except its own. Here
overall problem of network finding is subdivided into smaller
one where in spite of finding overall connectivity at a same
time; we discover the interactions of each individual gene
separately. After that, the actual structure of the network is to
be formed in conjunctive way. So we can write the change of i-
th gene in the following way
dxi,C= f(x1N , x2N , ………., x(m-1)N , xmN) where i = 1 to m
and i ≠ m (4)
In GRN problem, we have to infer the optimum level of
regulation between genes in terms of edge connectivity of
nodes (genes) in the network. Now the strength of regulation of
genes can be measured by a constant value called ‘weight’. In
our problem of genetic network reconstruction, we have to find
out the optimum values of weights between each pair of genes
such that maximum classification of accuracy is achieved and
that denotes the optimum weight matrix. For m number of
Genes, weight matrix [W]m×m can be defined in the following
way
[W]m×m = w1,1 w
1,2 …… w1,m
w2,1 w
2,2 …… w2,m
w(m-1),1 w
(m-1),,2 …… w(m-1),m
wm,1 w
m,2 …… wm,m (5)
where wi,j denotes weight corresponding to the regulation of
i-th Gene to the j-th Gene.
To achieve this optimum weight vector, we have used
Perceptron Model of Neural Network and calculate the weights
of edges between input and output layer. In this circumstance,
we consider [XN]1×m as input layer , [dXC]1×m as output layer
and [W]m×m as the weight matrix between them. According to
the fundamentals of Artificial Neural Network (considering the
zero as bias input and no activation function at output), we can
express the relationship between input and output layer by the
following equation
[dXC]= [XN]. [W] (6)
Now, wj,i is the element of the weight vector which is a
function of the expression level at normal stage of j-th gene and
change of the of j-th gene due to disease with some initial
constant K.
w
j,i = f(xiN , dxi)+ K (7)
Now during learning process i.e. determination of optimal
weight matrix of NN, the network is initialized by an initial
weight matrix with all elements are 1. Then calculate the
learned value of the expression values of i-th cancerous Gene
using the following formula
,
∑,, where
(8)
Next calculate the error in the learning process for i-th
Gene is calculated with the help of
,
,
,
(9)
Based on the Perceptron Learning rule, weight between i-th
and j-th gene is updated according to following formulas
0
,
,
(10)
0
,
,
(11)
where α is the learning rate of the Neural network. This is
an iterative process and it continues until the errors for all set
of genes are minimized (zero or very small value) or certain
stopping criterion is achieved (maximum no of iteration).
In the above model, we include and in Perceptron
based weight update formula as regulation of each gene depend
on the previous normal stage expression values as well change
of them except itself. These assumptions have particular
biological significance because a gene which has higher
expression value in the normal stage can be considered as more
active genes than the others. Therefore, strength of regulation
or weight should depend on proportionally. Moreover, a
gene whose expression level does not change significantly
during cancer can be treated as the less active gene which has
less importance in changing expression value of other genes
though it may have high expression value in initial normal
stage. Therefore weight should be updated according to the
values of change as well as initial value proportionally.
So this proposed model is more biologically plausible which
denotes that change of a gene expression values depend on the
change as well as initial normal value of others genes.
III. EXPERIMENTAL RESULT
In this research work, we have implemented our proposed
method on the microarray dataset of the Lung Adenocarcinoma
(GEO Accession No.: GSE 10072 obtained from NCBI
website) which have 22284 genes. Finding the regulatory
network between these large numbers of genes is almost
impossible and the regulations are hard to interpret. Therefore
in spite of developing huge complex network, we should try to
reconstruct most biologically significant network for small
number of most responsible genes for Adenocarcinoma and
that will denote the major regulations during cancer. Now
dimensionality reduction of dataset without losing important
information of it can be done by the use Rough Set Theory. In
earlier study [21,22], Rule Reduction process using Rough Set
is successfully applied to find out only 15 responsible genes
from the huge database. So, we have prior knowledge about the
dataset related to the expression value of the responsible genes.
We take the average value of the expression value of each gene
for each individual stage. We above implemented the proposed
model using Matlab7.6 to the following reduced dataset. It is
interesting to found that gene expression values of the
maximum genes are decreasing due to cancer.
TABLE I. DATASET USED FOR LEARNING OF NEURAL NETWORK
Sl. No. Gene ID
Average Value
of Normal
Lungs
Average Value
of Cancerous
Lungs
Gene1 201591_s_at 10.00 9.59
Gene2 201772_at 7.18 8.59
Gene3 201938_at 10.00 10.3
Gene4 202295_s_a 11.45 10.43
Gene5 203065_s_at 11.40 8.98
Gene6 203091_at 7.98 8.59
Gene7 203249_at 8.93 8.26
Gene8 205261_at 10.60 8.30
Gene9 206068_s_at 7.20 5.38
Gene10 208056_s_at 6.88 6.00
Gene11 209072_a 6.45 6.24
Gene12 209613_s_at 10.00 6.12
Gene13 218918_at 8.08 7.38
Gene14 222313_at 6.05 6.24
Gene15 49452_a 7.10 6.07
So, here [XN]1×15 = [10, 7.18, 10, 11.45, 11.40,
7.98, 8.93, 10.6, 7.2,6.88,6.45,10,8.08,6.05,7.10]
[Xc]1×15 = [9.59, 8.59, 10.3, 10.43, 8.98, 8.59, 8.26,
8.30, 5.38, 6, 6.24, 6.12, 7.38, 6.24, 6.07]
Here we consider the following parameters for NN,
α=0.0001; Max no. of iteration for weight update =1000;
minimum error = 0.001. The program is executed for all 15
genes separately until some stopping criterion (minimum error
/ maximum iteration) is achieved. After each run for different
genes, a set of weight vector is achieved and that denote the
amount regulation between each pair of genes. Further we get
following weight matrix whose each element denotes the
weight between corresponding two genes.
TABLE II. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(1-8)
Gene
i
,j
1 2 3 4 5 6 7 8
1 0 2 1 0 -2 1 0 -1
2 3 0 0 7 15 -1 3 13
3 3 -4 0 7 15 -1 4 13
4 1 2 1 0 -2 1 0 -2
5 0 2 1 0 0 2 0 -2
6 3 -4 0 7 15 0 3 13
7 1 2 1 0 -2 1 0 -1
8 1 2 1 0 -2 2 0 0
9 1 2 1 0 -2 1 0 -2
10 1 2 1 0 -2 1 0 -1
11 1 2 1 0 -2 1 0 -1
12 0 2 1 1 -2 2 0 -2
13 1 2 1 0 -2 1 0 -1
14 3 -4 0 7 15 -1 3 13
15 1 2 1 0 -2 1 0 -1
TABLE III. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(9-15)
Gene
i
,
j
9 10 11 12 13 14 15
1 0 0 1 -2 0 1 0
2 8 4 2 20 3 0 5
3 8 4 2 20 3 0 5
4 0 0 1 -3 0 1 0
5 0 0 1 -3 0 1 0
6 8 4 2 20 3 0 5
7 0 0 1 -2 0 1 0
8 0 0 1 -3 0 1 0
9 0 0 1 -3 0 1 0
10 0 0 1 -3 0 1 0
11 0 0 0 -3 0 1 0
12 0 0 1 0 0 1 0
13 0 0 1 -3 0 1 0
14 8 4 2 20 3 0 5
15 0 0 1 -2 0 1 0
The positive value denotes the activation, negative value
denotes inhibition and zero value denotes no regulation
between genes. From above table it can be observed that there
are large number of regulation exist between the different
genes. So, the network will be very complex in nature.
Moreover, we are interested to find out large regulatory effect
of genes. Therefore, the weight matrix is filtered up to a certain
threshold vale of weight (here threshold is 4) i.e. if the absolute
value of any element of weight matrix is greater than 4 than it
remains in the matrix else the value is replaced with zero.
Now based on this filtered weight matrix, a directed
weighted graph can be drawn easily where genes are denoted
by the nodes and the regulations are denoted by edges. If there
is a value of weight between two genes in the matrix, there will
be an edge between them else there will no edge or regulation.
It must be noted that if you decrease the threshold level,
network will be more complex and hard to interpret.
Fig. 1. GRN using proposed Neural Network
Above figure shows the desired small scale gene regulatory
network for the Lung Adenocarcinoma. The directed edges
confirm the regulations among the genes during cancer. It can
be clearly observed that Gene1, Gene7, Gene11 and Gene13 do
not take in any kind of regulation. Others genes regulate
(exhibit or inhibit) each other based on the value of the weight
between them.
Now the pseudo code of the proposed algorithm can be
given as
1. Input [XN]1×15 and [XC]1×15
2. Calculate [dxc]1×15
3. Initialize [W]10×15 with all one
4. Set α=0.0001
5. Loop i=1 to 15 do
6. Loop j= 1 to 15 do
7. Calculate initial learned output
,
∑,,
8. Calculate error ,
,
,
9. Loop i=1…..1000 do
10. if 0.001
11.
,
,
12. else if 0
13.
,
,
14. else
15. break
16. end if
17. Calculate learned output
,
∑,,
18. Calculate error using ,
,
,
19. end loop
20. if absolute value of each element ≥ 4 then
21. wi,j = wi,j
22. else
23. wi,j =0
24. end if
25. end loop
26. end loop
27. show the filtered weight matrix
28. draw the Directed Graph
IV. CONCLUSION
So, Neural Network can be successfully applied to all
aspects of gene regulatory network analysis from classification
to assessing network credibility. Here, input and output layer of
the network are the expression value of normal Lungs and
change of the responsible genes for a Lung Adenocarcinoma.
We have use a biologically relevant modified weight updating
formula for Perceptron learning algorithm of Neural Network
to obtain more biological plausible genetic network from
reduced microarray dataset of the disease. Base on the weight
matrix which has a direct meaning in terms of influence of
genes to others, a directed graph i.e. GRN can be drawn and
inferred successfully. So, during drug design we have to focus
on these regulating genes. The results can be improved by
applying several optimization algorithms. Though result is not
validated into wet lab, we hope it has great impact in
Computational Biology & Biomedical Science.
REFERENCES
[1] National Center for Biotechnology Information (NCBI), "Microarrays:
Chipping Away at the Mysteries of Science and Medicine," A Science
primer [online], 2007.
[2] S. Liang, S. Fuhrman, and R. Somogyi, "REVEAL, A general reverse
engineering algorithm for inference of genetic network
architectures,"Pacific Symposium on Biocomputing 3, pp. 18-29,1998.
[3] T. Akutsu, S. Miyano, and S. Kuhara, "Identification of Genetic
Networks from a Small Number of Gene Expression Patterns under the
Boolean Network Model," Pacific Symposium on Biocomputing 4, pp.
17-28, 1999.
[4] D. C. Weaver, C. T. Workman, and G. D. Stormo, "Modeling
Regulatory Networks with Weight Matrices," Pacific Symposium on
Biocomputing 4, pp.123, 1999.
[5] T. Akutsu, S. Miyano, and S. Kuhara, "Algorithms for Inferring
Qualitative Models of Biological Networks," Pacific Symposium on
Biocomputing 5, pp. 293-304, 2000.
[6] K. Murphy and S. Mian, "Modelling Gene Expression Data using
Dynamic Bayesian Networks," Computer Science Division, University
of California, Berkeley 1999.
[7] K. Murphy, "Dynamic Bayesian Networks: Representation, Inference
and Learning," in Computer Science: University of California, pp. 255,
2002.
[8] B. E. Perrin, L. Ralaivola, A. Mazurie, S. Bottani, J. Mallet, and F.
D'Alche-Buc, "Gene networks inference using dynamic Bayesian
networks," Bioinformatics, vol. 19, Suppl 2, pp. II138-II148, 2003.
[9] H. Wang, L Quin and E. Dougherty, “Inference of Gne Regulatory
Network using S- System: A unified Approach”, Proceeding of 2007
IEEE symposiam CIBCB, pp. 82-89, 2007.
[10] T. Nakayama, S. Seno, Y Takenaka and H. Matsuda, “ Inference of
Gene Regulatory Networks using Immune Algorithm”, Journal of
Bioinformatics and Computational Biology, Vol. 9, pp. 75-86, 2011.
[11] P. Du, J. Gong, E.S. Wurtele, and J.A. Dickerson, “Modeling gene
expression networks using fuzzy logic,” IEEE Transactions on Systems,
Man and Cybernetics, vol. 35, pp. 1351-1359. 2005.
[12] J. A. Dickerson, Z. Cox, E. S. Wurtele, and A. W. Fulmer, "Creating
Metabolic and Regulatory Network Models using Fuzzy Cognitive
Maps," North American Fuzzy Information Processing Conference
(NAFIPS), Vol.4, pp. 2171-2176, 2001.
[13] J. Vohradsky, “Neural network model of gene expression”, The FASEB
Journal, Vol. 15, pp.846-854, 2001.
[14] A.M. Ioannis, D. Andrei and T. Dimitris, “Gene regulatory networks
modeling using a dynamic evolutionary hybrid,” BMC Bioinformatics,
Vol. 11:140, pp.1-17, 2010.
[15] E. Keedwell and A. Narayanan, “Discovering Gene Networks with a
Neural-Genetic Hybrid”, IEEE/ACM Transaction in Computational
Biology and Bioinformatics, Vol. 2, No. 3, pp. 231-242, 2005.
[16] K. Kentzoglanankis and M. Poole, “A Swarm Intelligence Framework
for reconstructing Gene Networks: Searching for Biologically Plausible
Architecture”, IEEE/ACM Transaction in Computational Biology and
Bioinformatics, Vol. 9, No. 2, pp.355-371, 2012.
[17] R. Xu, D. C. Wunsch II and R.L. Frank, “ Inference of Genetic
Regulatory Networks with Recurrent Neural Network Models using
particle Swarm Optimization,”IEEE/ACM Transaction in Computational
Biology and Bioinformatics, Vol. 4, No. 4, pp.681-692, 2007.
[18] P. Rakshit, P. Das, A. Konar, M. Nasipuri and R Janarthan, “ A recurrent
Fuzzy Neural Network model of a Gene Regulatory for Knowledge
Extraction Using Invasive Weed and Artificial Bee Colony Optimization
Algorithm ”, Conference Proceeding of 1st International Conference on
Recent Advances in Information Technology (RAIT-2012), 2012.
[19] N. Vijesh, S. K. Chakrabarty and J. Sreekumar, “Modelling of gene
regulatory network: A review,” Journal of Biomedical Science and
Engineering, Vol. 6, pp. 223-231, 2013.
[20] S. Mandal, G. Saha and R. K. Pal, “Comparative study on Disease
Classification using Different Soft Computing Techniques,” The SIJ
Transactions on Computer Science Engineering & its Applications
(CSEA), Vol. 2, No. 3, 2014.
[21] S. Mandal and G. Saha, “Rough Set Theory Based Automated Disease
Diagnosis using Lung Adenocarcinoma as Test Case”, The SIJ
Transactions on Computer Science Engineering & its Applications
(CSEA), Vol. 1, No. 3, 2013.
[22] S. Mandal , G. Saha and R.K. Pal, “Reconstruction of Dominant Gene
Regulatory Network from Microarray Data Using Rough Set and
Bayesian Approach,” Journal of Computer Science & Systems Biology,
Volume 6, Issue:5 , pp.262-270, 2013.