Conference PaperPDF Available

Neural Network Based Gene Regulatory Network Reconstruction

Authors:

Abstract and Figures

Gene Regulatory Networks (GRN) is used to model the regulations in living organisms. Inferring genetic network from different experimental high throughput biological data (like microarray) is a challenging job for all researchers. In this paper, Artificial Neural Network, which is a very effective soft computing tool to learn and model the dynamics or dependencies between genes, is used for reconstruction of small scale GRN from the reduced microarray dataset of Lung Adenocarcinoma. The significance of regulations of one gene to other genes of the system are expressed by a weight matrix which is computed using Perceptron based biologically significant weight updating method by minimizing the error during learning. Based on the values of elements of filtered weight matrix, a directed weighted graph can be drawn successfully that denotes gene regulatory network.
Content may be subject to copyright.
Neural Network Based Gene Regulatory Network
Reconstruction
Sudip Mandal1, Goutam Saha2 and Rajat Kumar Pal3
1ECE Department, GIMT, Krishna Nagar, India, email: sudip.mandal007@gmail.com
2IT Department, NEHU, Shillong, India, email: dr_goutamsaha@yahoo.com
3CSE Department, University of Calcutta, Kolkata, India, email: pal.rajatk@gmail.com
Abstract—Gene Regulatory Networks (GRN) is used to model
the regulations in living organisms. Inferring genetic network
from different experimental high throughput biological data (like
microarray) is a challenging job for all researchers. In this paper,
Artificial Neural Network, which is a very effective soft
computing tool to learn and model the dynamics or dependencies
between genes, is used for reconstruction of small scale GRN
from the reduced microarray dataset of Lung Adenocarcinoma.
The significances of regulations of one gene to other genes of the
system are expressed by a weight matrix which is computed using
Perceptron based biologically significant weight updating method
by minimizing the error during learning. Based on the values of
elements of filtered weight matrix, a directed weighted graph can
be drawn successfully that denotes gene regulatory network.
Keywords—Gene Regulatory Network; Microarray data;
Neural Network.
I. INTRODUCTION
Cellular biology is now becoming a rising area for
researchers to carry out their research. Now a days, there is
plenty amount of biological data accessible because of different
advanced experimental technologies like microarray from
online database like National Center for Biotechnology
Information (NCBI) etc. [1]. To analyze and recover
informative knowledge from these data, efficient
computational techniques are required. These methods can help
us in knowing interactions carried out in organisms at genomic
levels through some mathematical model. Such gene to gene
communications are represented in terms of special network
known as Gene Regulatory Network (GRN). It is a graphical
representation in which nodes consist of genes or protein and
edges connecting them show regulatory relationships between
them. It is constructed by observing the behavior of genes and
their impacts on other genes at particular experimental
condition. This behavior is analyzed in terms of measuring the
values of gene expressions with the help Microarray
technology. Depending on the expression level of genes and
their interaction, the status of human organs can either be
normal or cancerous.
When Transcription Factors (specialized proteins) bind to
promoter region of DNA, it can modify the rate of protein
synthesis and that can results sudden changes in values of gene
expressions. Now, if the rate of protein synthesis decreases
then it is called as inhibition or down regulation or negative
regulation and if it increases then it is called as activation or
up-regulation or positive regulation. If gene g1 positively
regulates gene g2 then it mean there is increase in expression
level of g2 because of g1 and in negative regulation it decreases
expression level of g2. GRN is very helpful for analyzing the
effect of drugs on genes, finding interaction between genes and
genetic pathway for disease development. It also helps in
studying dynamics of specific gene under particular diseased or
experimental circumstances. It helps in studying diseases (in
this case Lungs Adenocarcinoma) caused due to genes.
Many types of models have been proposed for
reconstructing gene regulatory networks in biological systems,
including Boolean networks [2-3], linear weighting networks
[4], differential equations [5], and Bayesian Networks [6-8], S-
system [9-10], Fuzzy Set [11-12], Artificial & Recurrent
Neural Network [13] and Evolutionary Method [14-18] etc.
Boolean networks examine binary state transition matrices to
search patterns in gene expression. Every part of the network is
either on or off depending on whether a signal exceeds a pre-
determined threshold level. Generalized Logical Networks
permit the variables of Boolean networks to have more than
two values and utilize generalized Boolean functions to define
their relationship. Probabilistic Boolean Networks merge
several promising Boolean functions together, so that each one
makes a contribution to the prediction of a target gene. On the
other hand probabilistic model randomly choose one of these
promising predictors. A Linear weighting network has the
benefit of simplicity since they use simple weight matrices to
additively recombine the contributions of different regulatory
elements. A Bayesian networks probabilistic transitions
between network states assuming that there are no cycles in the
network. However cycles or loops are the most important
mechanism to ensure stability. A Dynamic Bayesian Networks
merge the features of Hidden Markov Models to include
feedback. When modeling GRNs with the S-System method,
the expression rates are defined by the difference of two
products of power-law functions, where the first denotes the
activation term and the second the degradation term of a gene
product. Fuzzy work models the interactions between genes in
gene regulatory pathways using fuzzy weights and clustering.
In neural network based model, genes are represented by nodes
input and output layer of neural network where weight matrix
between nodes represents regulatory relationship between
genes. Moreover, different evolutionary hybrid methods like
Recurrent Neural Network, Genetic Algorithm, Particle Swarm
optimization, Ant Colony Optimization and Bee Colony
optimization etc [19, 20] are also already proposed to infer
gene regulatory network. Taking into account of all advantages
of GRN, identifying GRN is still quite complicated and
fascinating task for researchers.
In this paper, Neural Network method is applied for
reconstruction genetic network. This model is used to calculate
978-1-4799-4445-3/15/$31.00 © 2015 IEEE
the change of expression rates that depends on the weighted
sum of multiple regulatory genes expression levels which are
prior available for this experiment. A more biologically
significant and improved weight updating formula is used for
modeling of Gene Regulatory Network using Perceptron
Learning rule to find more biologically plausible regulatory
network. It is assumed that all related changes in expression
value of each gene depend on the expression value of previous
stage and changes of them also. The corresponding weight
matrix denotes the level of regulation between genes. Based on
this weight matrix, a directed weighted graph can be drawn
which denotes the gene regulatory network. The proposed
model is discussed in the next section. The experimental GRN
& results for the microarray dataset of Lung Adenocarcinoma
are shown in Section-III. Conclusion is given Section IV
followed by references.
II. NEURAL NETWORK MODEL FOR GRN
Neural Network (NN) is a very effective soft computing
tool to learn the pattern from the raw input data similar to the
working principle of Neuron in nervous system of human body.
This model is biologically plausible and noise-resistant. It is
continuous in time and uses a transfer function to transfer the
inputs into a shape similar to that observed in natural processes.
In addition, its nonlinear characteristics provide information
about the principles of control, as well as about the natural
interactions of elements of the modeled system.
Let’s assume that G1, G2, ……., G(m-1) and G
m are m
number of specific responsible genes for the cancer and x1N ,
x2N , ………., x(m-1)N , xmN are the corresponding gene expression
values of those genes of normal cell (here we consider Lung
Adenocarcinoma as the test case). When the lungs is affected
by cancer (during Lung Adenocarcinoma), the gene expression
levels of those genes will be changed as x1C , x2C , ………., x(m-
1)C , xmC. Now we define two 1×m dimensional matrixes to
define expression values of genes corresponding to normal and
cancer which can be found from microarray dataset of Lung
Adenocarcinoma.
[XN]1×m = x1N , x2N , ………., x(m-1)N , xmN (1)
[Xc]1×m = x1C , x2C , ………., x(m-1)C , xmC (2)
Let [dXC]1×m is a matrix which denote the relative change in
the expression values due to cancer.
[dXC]1×m = (x1C - x1N) , (x2C – x2N) , ………., (x(m-1)C - x(m-1)N) ,
(xmC - xmN) (3)
For the reconstruction Gene Regulatory Network, we have
implemented Neural Network model based on Perceptron
learning algorithm which has one input and one output layer.
There is no hidden layer in this structure. In this context, we
have assumed that due to cancer the relative change of
expression value of each gene is a function of expression
values previous normal stages of all genes except its own. Here
overall problem of network finding is subdivided into smaller
one where in spite of finding overall connectivity at a same
time; we discover the interactions of each individual gene
separately. After that, the actual structure of the network is to
be formed in conjunctive way. So we can write the change of i-
th gene in the following way
dxi,C= f(x1N , x2N , ………., x(m-1)N , xmN) where i = 1 to m
and i m (4)
In GRN problem, we have to infer the optimum level of
regulation between genes in terms of edge connectivity of
nodes (genes) in the network. Now the strength of regulation of
genes can be measured by a constant value called ‘weight’. In
our problem of genetic network reconstruction, we have to find
out the optimum values of weights between each pair of genes
such that maximum classification of accuracy is achieved and
that denotes the optimum weight matrix. For m number of
Genes, weight matrix [W]m×m can be defined in the following
way
[W]m×m = w1,1 w
1,2 …… w1,m
w2,1 w
2,2 …… w2,m
w(m-1),1 w
(m-1),,2 …… w(m-1),m
wm,1 w
m,2 …… wm,m (5)
where wi,j denotes weight corresponding to the regulation of
i-th Gene to the j-th Gene.
To achieve this optimum weight vector, we have used
Perceptron Model of Neural Network and calculate the weights
of edges between input and output layer. In this circumstance,
we consider [XN]1×m as input layer , [dXC]1×m as output layer
and [W]m×m as the weight matrix between them. According to
the fundamentals of Artificial Neural Network (considering the
zero as bias input and no activation function at output), we can
express the relationship between input and output layer by the
following equation
[dXC]= [XN]. [W] (6)
Now, wj,i is the element of the weight vector which is a
function of the expression level at normal stage of j-th gene and
change of the of j-th gene due to disease with some initial
constant K.
w
j,i = f(xiN , dxi)+ K (7)
Now during learning process i.e. determination of optimal
weight matrix of NN, the network is initialized by an initial
weight matrix with all elements are 1. Then calculate the
learned value of the expression values of i-th cancerous Gene
using the following formula
,
,, where
   (8)
Next calculate the error in the learning process for i-th
Gene is calculated with the help of
 ,
,
,
(9)
Based on the Perceptron Learning rule, weight between i-th
and j-th gene is updated according to following formulas
 0
, 
, 
      (10)
   0 
, 
, 
      (11)
where α is the learning rate of the Neural network. This is
an iterative process and it continues until the errors for all set
of genes are minimized (zero or very small value) or certain
stopping criterion is achieved (maximum no of iteration).
In the above model, we include  and  in Perceptron
based weight update formula as regulation of each gene depend
on the previous normal stage expression values as well change
of them except itself. These assumptions have particular
biological significance because a gene which has higher
expression value in the normal stage can be considered as more
active genes than the others. Therefore, strength of regulation
or weight should depend on  proportionally. Moreover, a
gene whose expression level does not change significantly
during cancer can be treated as the less active gene which has
less importance in changing expression value of other genes
though it may have high expression value in initial normal
stage. Therefore weight should be updated according to the
values of change  as well as initial value  proportionally.
So this proposed model is more biologically plausible which
denotes that change of a gene expression values depend on the
change as well as initial normal value of others genes.
III. EXPERIMENTAL RESULT
In this research work, we have implemented our proposed
method on the microarray dataset of the Lung Adenocarcinoma
(GEO Accession No.: GSE 10072 obtained from NCBI
website) which have 22284 genes. Finding the regulatory
network between these large numbers of genes is almost
impossible and the regulations are hard to interpret. Therefore
in spite of developing huge complex network, we should try to
reconstruct most biologically significant network for small
number of most responsible genes for Adenocarcinoma and
that will denote the major regulations during cancer. Now
dimensionality reduction of dataset without losing important
information of it can be done by the use Rough Set Theory. In
earlier study [21,22], Rule Reduction process using Rough Set
is successfully applied to find out only 15 responsible genes
from the huge database. So, we have prior knowledge about the
dataset related to the expression value of the responsible genes.
We take the average value of the expression value of each gene
for each individual stage. We above implemented the proposed
model using Matlab7.6 to the following reduced dataset. It is
interesting to found that gene expression values of the
maximum genes are decreasing due to cancer.
TABLE I. DATASET USED FOR LEARNING OF NEURAL NETWORK
Sl. No. Gene ID
Average Value
of Normal
Lungs
Average Value
of Cancerous
Lungs
Gene1 201591_s_at 10.00 9.59
Gene2 201772_at 7.18 8.59
Gene3 201938_at 10.00 10.3
Gene4 202295_s_a 11.45 10.43
Gene5 203065_s_at 11.40 8.98
Gene6 203091_at 7.98 8.59
Gene7 203249_at 8.93 8.26
Gene8 205261_at 10.60 8.30
Gene9 206068_s_at 7.20 5.38
Gene10 208056_s_at 6.88 6.00
Gene11 209072_a 6.45 6.24
Gene12 209613_s_at 10.00 6.12
Gene13 218918_at 8.08 7.38
Gene14 222313_at 6.05 6.24
Gene15 49452_a 7.10 6.07
So, here [XN]1×15 = [10, 7.18, 10, 11.45, 11.40,
7.98, 8.93, 10.6, 7.2,6.88,6.45,10,8.08,6.05,7.10]
[Xc]1×15 = [9.59, 8.59, 10.3, 10.43, 8.98, 8.59, 8.26,
8.30, 5.38, 6, 6.24, 6.12, 7.38, 6.24, 6.07]
Here we consider the following parameters for NN,
α=0.0001; Max no. of iteration for weight update =1000;
minimum error = 0.001. The program is executed for all 15
genes separately until some stopping criterion (minimum error
/ maximum iteration) is achieved. After each run for different
genes, a set of weight vector is achieved and that denote the
amount regulation between each pair of genes. Further we get
following weight matrix whose each element denotes the
weight between corresponding two genes.
TABLE II. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(1-8)
Gene
i
,j
1 2 3 4 5 6 7 8
1 0 2 1 0 -2 1 0 -1
2 3 0 0 7 15 -1 3 13
3 3 -4 0 7 15 -1 4 13
4 1 2 1 0 -2 1 0 -2
5 0 2 1 0 0 2 0 -2
6 3 -4 0 7 15 0 3 13
7 1 2 1 0 -2 1 0 -1
8 1 2 1 0 -2 2 0 0
9 1 2 1 0 -2 1 0 -2
10 1 2 1 0 -2 1 0 -1
11 1 2 1 0 -2 1 0 -1
12 0 2 1 1 -2 2 0 -2
13 1 2 1 0 -2 1 0 -1
14 3 -4 0 7 15 -1 3 13
15 1 2 1 0 -2 1 0 -1
TABLE III. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(9-15)
Gene
i
,
j
9 10 11 12 13 14 15
1 0 0 1 -2 0 1 0
2 8 4 2 20 3 0 5
3 8 4 2 20 3 0 5
4 0 0 1 -3 0 1 0
5 0 0 1 -3 0 1 0
6 8 4 2 20 3 0 5
7 0 0 1 -2 0 1 0
8 0 0 1 -3 0 1 0
9 0 0 1 -3 0 1 0
10 0 0 1 -3 0 1 0
11 0 0 0 -3 0 1 0
12 0 0 1 0 0 1 0
13 0 0 1 -3 0 1 0
14 8 4 2 20 3 0 5
15 0 0 1 -2 0 1 0
The positive value denotes the activation, negative value
denotes inhibition and zero value denotes no regulation
between genes. From above table it can be observed that there
are large number of regulation exist between the different
genes. So, the network will be very complex in nature.
Moreover, we are interested to find out large regulatory effect
of genes. Therefore, the weight matrix is filtered up to a certain
threshold vale of weight (here threshold is 4) i.e. if the absolute
value of any element of weight matrix is greater than 4 than it
remains in the matrix else the value is replaced with zero.
Now based on this filtered weight matrix, a directed
weighted graph can be drawn easily where genes are denoted
by the nodes and the regulations are denoted by edges. If there
is a value of weight between two genes in the matrix, there will
be an edge between them else there will no edge or regulation.
It must be noted that if you decrease the threshold level,
network will be more complex and hard to interpret.
Fig. 1. GRN using proposed Neural Network
Above figure shows the desired small scale gene regulatory
network for the Lung Adenocarcinoma. The directed edges
confirm the regulations among the genes during cancer. It can
be clearly observed that Gene1, Gene7, Gene11 and Gene13 do
not take in any kind of regulation. Others genes regulate
(exhibit or inhibit) each other based on the value of the weight
between them.
Now the pseudo code of the proposed algorithm can be
given as
1. Input [XN]1×15 and [XC]1×15
2. Calculate [dxc]1×15
3. Initialize [W]10×15 with all one
4. Set α=0.0001
5. Loop i=1 to 15 do
6. Loop j= 1 to 15 do
7. Calculate initial learned output
,
,, 

  
8. Calculate error   ,
,
,
9. Loop i=1…..1000 do
10. if   0.001 
11.
, 
, 
    
12. else if   0
13.
, 
, 
     
14. else
15. break
16. end if
17. Calculate learned output
,
,, 

 
18. Calculate error using ,
,
,
19. end loop
20. if absolute value of each element 4 then
21. wi,j = wi,j
22. else
23. wi,j =0
24. end if
25. end loop
26. end loop
27. show the filtered weight matrix
28. draw the Directed Graph
IV. CONCLUSION
So, Neural Network can be successfully applied to all
aspects of gene regulatory network analysis from classification
to assessing network credibility. Here, input and output layer of
the network are the expression value of normal Lungs and
change of the responsible genes for a Lung Adenocarcinoma.
We have use a biologically relevant modified weight updating
formula for Perceptron learning algorithm of Neural Network
to obtain more biological plausible genetic network from
reduced microarray dataset of the disease. Base on the weight
matrix which has a direct meaning in terms of influence of
genes to others, a directed graph i.e. GRN can be drawn and
inferred successfully. So, during drug design we have to focus
on these regulating genes. The results can be improved by
applying several optimization algorithms. Though result is not
validated into wet lab, we hope it has great impact in
Computational Biology & Biomedical Science.
REFERENCES
[1] National Center for Biotechnology Information (NCBI), "Microarrays:
Chipping Away at the Mysteries of Science and Medicine," A Science
primer [online], 2007.
[2] S. Liang, S. Fuhrman, and R. Somogyi, "REVEAL, A general reverse
engineering algorithm for inference of genetic network
architectures,"Pacific Symposium on Biocomputing 3, pp. 18-29,1998.
[3] T. Akutsu, S. Miyano, and S. Kuhara, "Identification of Genetic
Networks from a Small Number of Gene Expression Patterns under the
Boolean Network Model," Pacific Symposium on Biocomputing 4, pp.
17-28, 1999.
[4] D. C. Weaver, C. T. Workman, and G. D. Stormo, "Modeling
Regulatory Networks with Weight Matrices," Pacific Symposium on
Biocomputing 4, pp.123, 1999.
[5] T. Akutsu, S. Miyano, and S. Kuhara, "Algorithms for Inferring
Qualitative Models of Biological Networks," Pacific Symposium on
Biocomputing 5, pp. 293-304, 2000.
[6] K. Murphy and S. Mian, "Modelling Gene Expression Data using
Dynamic Bayesian Networks," Computer Science Division, University
of California, Berkeley 1999.
[7] K. Murphy, "Dynamic Bayesian Networks: Representation, Inference
and Learning," in Computer Science: University of California, pp. 255,
2002.
[8] B. E. Perrin, L. Ralaivola, A. Mazurie, S. Bottani, J. Mallet, and F.
D'Alche-Buc, "Gene networks inference using dynamic Bayesian
networks," Bioinformatics, vol. 19, Suppl 2, pp. II138-II148, 2003.
[9] H. Wang, L Quin and E. Dougherty, “Inference of Gne Regulatory
Network using S- System: A unified Approach”, Proceeding of 2007
IEEE symposiam CIBCB, pp. 82-89, 2007.
[10] T. Nakayama, S. Seno, Y Takenaka and H. Matsuda, “ Inference of
Gene Regulatory Networks using Immune Algorithm”, Journal of
Bioinformatics and Computational Biology, Vol. 9, pp. 75-86, 2011.
[11] P. Du, J. Gong, E.S. Wurtele, and J.A. Dickerson, “Modeling gene
expression networks using fuzzy logic,” IEEE Transactions on Systems,
Man and Cybernetics, vol. 35, pp. 1351-1359. 2005.
[12] J. A. Dickerson, Z. Cox, E. S. Wurtele, and A. W. Fulmer, "Creating
Metabolic and Regulatory Network Models using Fuzzy Cognitive
Maps," North American Fuzzy Information Processing Conference
(NAFIPS), Vol.4, pp. 2171-2176, 2001.
[13] J. Vohradsky, “Neural network model of gene expression”, The FASEB
Journal, Vol. 15, pp.846-854, 2001.
[14] A.M. Ioannis, D. Andrei and T. Dimitris, “Gene regulatory networks
modeling using a dynamic evolutionary hybrid,” BMC Bioinformatics,
Vol. 11:140, pp.1-17, 2010.
[15] E. Keedwell and A. Narayanan, “Discovering Gene Networks with a
Neural-Genetic Hybrid”, IEEE/ACM Transaction in Computational
Biology and Bioinformatics, Vol. 2, No. 3, pp. 231-242, 2005.
[16] K. Kentzoglanankis and M. Poole, “A Swarm Intelligence Framework
for reconstructing Gene Networks: Searching for Biologically Plausible
Architecture”, IEEE/ACM Transaction in Computational Biology and
Bioinformatics, Vol. 9, No. 2, pp.355-371, 2012.
[17] R. Xu, D. C. Wunsch II and R.L. Frank, “ Inference of Genetic
Regulatory Networks with Recurrent Neural Network Models using
particle Swarm Optimization,”IEEE/ACM Transaction in Computational
Biology and Bioinformatics, Vol. 4, No. 4, pp.681-692, 2007.
[18] P. Rakshit, P. Das, A. Konar, M. Nasipuri and R Janarthan, “ A recurrent
Fuzzy Neural Network model of a Gene Regulatory for Knowledge
Extraction Using Invasive Weed and Artificial Bee Colony Optimization
Algorithm ”, Conference Proceeding of 1st International Conference on
Recent Advances in Information Technology (RAIT-2012), 2012.
[19] N. Vijesh, S. K. Chakrabarty and J. Sreekumar, “Modelling of gene
regulatory network: A review,” Journal of Biomedical Science and
Engineering, Vol. 6, pp. 223-231, 2013.
[20] S. Mandal, G. Saha and R. K. Pal, “Comparative study on Disease
Classification using Different Soft Computing Techniques,” The SIJ
Transactions on Computer Science Engineering & its Applications
(CSEA), Vol. 2, No. 3, 2014.
[21] S. Mandal and G. Saha, “Rough Set Theory Based Automated Disease
Diagnosis using Lung Adenocarcinoma as Test Case”, The SIJ
Transactions on Computer Science Engineering & its Applications
(CSEA), Vol. 1, No. 3, 2013.
[22] S. Mandal , G. Saha and R.K. Pal, “Reconstruction of Dominant Gene
Regulatory Network from Microarray Data Using Rough Set and
Bayesian Approach,” Journal of Computer Science & Systems Biology,
Volume 6, Issue:5 , pp.262-270, 2013.
... Again, simulated time-series data were used in these studies with reasonable results. A more recent study published in 2015 [26] used a linear classifier with one input "layer" and one output "layer" (which they called a neural network) to infer regulatory relationships (as represented by the weights of the weight matrix in the linear classifier) among genes in lung adenocarcinoma gene expression data. However, they did not evaluate their learned regulatory network, did not use DNA mutation data (as we do in this work), and did not use neural networks. ...
Article
Full-text available
Simple Summary Cancer results from aberrant cellular signaling caused by somatic genomic alterations (SGAs). However, inferring how SGAs cause aberrations in cellular signaling and lead to cancer remains challenging. We designed an interpretable deep learning model to encode the impact of SGAs on cellular signaling systems (represented by hidden nodes in the model) and eventually on tumor gene expression. The transparent deep learning architecture enabled the model to discover drivers affecting common signaling pathways and partially resolve the causal structure of signaling proteins. This is an early attempt to use transparent deep learning model, in contrast to conventional "black box" approach, to learn interpretable insights into cancer cell signaling systems. A better representation of signaling system of a cancer cell sheds light on the disease mechanisms of the cancer and can guide precision medicine. Abstract Cancer is a disease of aberrant cellular signaling resulting from somatic genomic alterations (SGAs). Heterogeneous SGA events in tumors lead to tumor-specific signaling system aberrations. We interpret the cancer signaling system as a causal graphical model, where SGAs affect signaling proteins, propagate their effects through signal transduction, and ultimately change gene expression. To represent such a system, we developed a deep learning model called redundant-input neural network (RINN) with a transparent redundant-input architecture. Our findings demonstrate that by utilizing SGAs as inputs, the RINN can encode their impact on the signaling system and predict gene expression accurately when measured as the area under ROC curves. Moreover, the RINN can discover the shared functional impact (similar embeddings) of SGAs that perturb a common signaling pathway (e.g., PI3K, Nrf2, and TGF). Furthermore, the RINN exhibits the ability to discover known relationships in cellular signaling systems.
... Again, time series simulated data was used in these studies with reasonable results. A more recent study from 2015 [24], used a linear classifier with one input "layer" and one output "layer" (which they called a neural network) to infer regulatory relationships (as represented by the weights of the weight matrix in the linear classifier) between genes in lung adenocarcinoma gene expression data. However, they did not evaluate their learned regulatory network, did not use DNA mutation data (as we do in this work), and did not use neural networks. ...
Preprint
Cancer is a disease of aberrant cellular signaling and tumor-specific aberrations in signaling systems determine the aggressiveness of a cancer and response to therapy. Identifying such abnormal signaling pathways causing a patient’s cancer would enable more patient-specific and effective treatments. We interpret the cellular signaling system as a causal graphical model, where it is known that genomic alterations cause changes in the functions of signaling proteins, and the propagation of signals among proteins eventually leads to changed gene expression. To represent such a system, we developed a deep learning model, referred to as a redundant input neural network (RINN), with a redundant input architecture and an L 1 regularized objective function to find causal relationships between input, latent, and output variables—when it is known a priori that input variables cause output variables. We hypothesize that training RINN on cancer omics data will enable us to map the functional impacts of genomic alterations to latent variables in a deep learning model, allowing us to discover the hierarchical causal relationships between variables perturbed by different genomic alterations. Importantly, the direct connections between all input and all latent variables in RINN make the latent variables partially interpretable, as they can be easily mapped to input space. We show that gene expression can be predicted from genomic alterations with reasonable accuracy when measured as the area under ROC curves (AUROCs). We also show that RINN is able to discover the shared functional impact of genomic alterations that perturb a common cancer signaling pathway, especially relationships in the PI3K, Nrf2, and TGFβ pathways, including some causal relationships. However, despite high regularization, the learned causal relationships were somewhat too dense to be easily and directly interpretable as causal graphs. We suggest promising future directions for RINN, including differential regularization, autoencoder pretrained representations, and constrained evolutionary strategies. Author summary A modified deep learning model (RINN with L 1 regularization) can be used to capture cancer signaling pathway relationships within its hidden variables and weights. We found that genomic alterations impacting the same known cancer pathway had interactions with a similar set of RINN latent variables. Having genomic alterations (input variables) directly connected to all latent variables in the RINN model allowed us to label the latent variables with a set of genomic alterations, making the latent variables partially interpretable. With this labeling, we were able to visualize RINNs as causal graphs and capture at least some of the causal relationships in known cancer signaling pathways. However, the graphs learned by RINN were somewhat too dense (despite large amounts of regularization) to compare directly to known cancer signaling pathways. We also found that differential expression can be predicted from genomic alterations by a RINN with reasonably high AUROCs, especially considering the very high dimensionality of the prediction task relative to the number of input variables and instances in the dataset. These are encouraging results for the future of deep learning models trained on cancer genomic data.
... Several methods have been proposed in the literature for the reverse engineering of GRNs like Boolean Networks (Xiao, 2009;Ruz et al., 2015), Probabilistic Boolean Networks (PBN) (Shmulevich et al., 2002), Bayesian Networks Cooper and Herskovits, 1992;Bansal et al., 2015), Dynamic Bayesian Networks (DBN) Murphy, 2002;Shermin and Orgun, 2009), Artificial Neural Network based models (Ressom, 2006;Mandal et al., 2015;Liu et al.,2014) and ...
Article
Full-text available
Genes of an organism play a very crucial role in the working of various cellular activities. Genes and other biological molecules like DNA, RNA do not operate alone but they all are correlated. Their relationships are shown with the help of networks commonly known as Gene Regulatory Networks. Gene Regulatory Networks are complex control networks that show the map of interactions among the genes. They provide very useful contribution to the genomic science and increase the understanding of various biological processes. In this paper, fuzzy logic based method is proposed for the reverse engineering of gene regulatory network from microarray gene expression datasets. Pre-processing steps have been introduced to increase the efficiency of the method. Clustering technique is also employed to divide the problem into sub problems to reduce the computational complexity at some extent. Finally, the proposed method is tested on two different time course gene expression datasets of yeast having GEO accession number GDS37 and GDS3030. The results are validated by using Specificity, Sensitivity and F-score as parameters. Results of the proposed method are further compared with other existing method which was proposed by Al-Shobaili in 2014.
Conference Paper
Current progresses in cellular biology and bioinformatics (namely, DNA microarrays) allow researchers to get a distinct picture of the complex biochemical process that occurred within a cell of human body. Therefore, this technology opened up a new door to the researchers of computer science as well as to biologists. The data generated by these microarray experiments are very high dimensional and noisy in nature. One of the greatest challenges of the post genomic era is to investigate and inference of the regulatory interactions or dependencies between genes-genes or genes-proteins from the microarray data. Here, a new methodology has been devel-oped for investigating genetic interaction among genes from temporal gene expression data by combining the features of Neural Network and Cuckoo Search optimization technique. This hybrid technique has been applied on the real-world microarray dataset of Lung Adenocarcinoma. NN-CS algorithm has been used to model genetic network by searching the best combination of regulatory genes that can affect a particular gene most. The derived results confirmed that the proposed approach can able to infer small scale genetic networks that fit most with the training data. It is believed that the proposed algorithm for finding gene regulatory network has great potential in medical science.
Article
Full-text available
Number of Patients with cancer, heart disease & Diabetes are increasing day by day because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, drugs and smoking etc. A range of therapies have been provided by researchers already. Early diagnosis is of considerable significance of the physician's skills conducted based on their knowledge and experience yet an error might occur. Using various Artificial Intelligence methods for medical diagnosis of diseases has recently become widespread. These intelligent systems help physicians as a diagnosis assistant. Now, various Artificial Neural Network, Rough Set, Decision Tree, Bayesian Network are very popular for this purpose. This paper provides a review of different soft computing method in diagnosis and detection of above mentioned disorders acuteness. The survey is carried out for three different types of data of different diseases with cross validation and percentage split for testing new data sets of each. The results indicates that Rough Set Theory gives maximum accuracy and coverage area but with maximum computational time complexity. On the other hand Neural and Bayesian Network give quite satisfactory results. Moreover the obtained results also suggest that accuracy depends on the quality of normalization of data.
Article
Full-text available
Gene regulatory networks play an important role the molecular mechanism underlying biological processes. Modeling of these networks is an important challenge to be addressed in the post genomic era. Several methods have been proposed for estimating gene networks from gene expression data. Computational methods for development of network models and analysis of their functionality have proved to be valuable tools in bioinformatics applications. In this paper we tried to review the different methods for reconstructing gene regulatory networks.
Article
Full-text available
Biological databases, containing genetic information of patients, are undergoing tremendous growth beyond our analysing capability. However such analysis can reveal new findings about the cause and subsequent treatment of any disease. Interactions between genes and the proteins they synthesize shape Genetic Regulatory Networks (GRN). In this context, it has been developed a model capable of representing small dominant GRN, combining characteristics from the Rough Set and Bayesian Network. The investigation has been carried out on the publicly available microarray dataset for Lung Adenocarcinoma, obtained from the National Center for Biotechnology Information (NCBI) website. The analysis revealed that Rough Set Theory (RST) is able to extract the various dominant genes in term of reducts which play an important role in causing the disease and also able to provide a unique simplified rule set for building expert systems in medical sciences with high accuracy and coverage factor. The next part of this work is based on reconstruction of GRN using Bayesian network, which is a mathematical tool for modelling conditional independences between stochastic variables like different gene expression. This proposed Bayesian approach using scaled mutual information for scoring is applied to the dataset corresponding to most dominant responsible genes for Adenocarcinoma to uncover, gene/protein interactions and key biological features of the cellular system. Finally different interacting regulatory path which are the gene signature for a particular disease, between dominating genes are inferred from the probability distribution table and Bayesian Graph. Such reconstructed regulatory network is attractive for their ability to describe complex stochastic processes like gene transcription, classification of biological sequencing and intuitive model of causal influence successfully. This may serve as a signature pattern of the disease Adenocarcinoma, which has been extracted from huge microarray dataset. Extraction of this signature pattern is very useful for diagnosis of this disease.
Article
Full-text available
The S-system model is one of the nonlinear differential equation models of gene regulatory networks, and it can describe various dynamics of the relationships among genes. If we successfully infer rigorous S-system model parameters that describe a target gene regulatory network, we can simulate gene expressions mathematically. However, the problem of finding an optimal S-system model parameter is too complex to be solved analytically. Thus, some heuristic search methods that offer approximate solutions are needed for reducing the computational time. In previous studies, several heuristic search methods such as Genetic Algorithms (GAs) have been applied to the parameter search of the S-system model. However, they have not achieved enough estimation accuracy. One of the conceivable reasons is that the mechanisms to escape local optima. We applied an Immune Algorithm (IA) to search for the S-system parameters. IA is also a heuristic search method, which is inspired by the biological mechanism of acquired immunity. Compared to GA, IA is able to search large solution space, thereby avoiding local optima, and have multiple candidates of the solutions. These features work well for searching the S-system model. Actually, our algorithm showed higher performance than GA for both simulation and real data analyses.
Conference Paper
Full-text available
In this paper, a unified approach to infer gene regulatory networks using the S-system model is proposed. In order to discover the structure of large-scale gene regulatory networks, a simplified S-system model is proposed that enables fast parameter estimation to determine the major gene interactions. If a detailed S-system model is desirable for a subset of genes, a two-step method is proposed where the range of the parameters will be determined first using genetic programming and recursive least square estimation. Then the exact values of the parameters will be calculated using a multi-dimensional optimization algorithm. Both downhill simplex algorithm and modified Powell algorithm are tested for multi-dimensional optimization. Simulation results using both synthetic data and real microarray measurements demonstrate the effectiveness of the proposed methods
Article
Generating inferences from a gene regulatory network is important to understand the fundamental cellular processes, involving gene functions, and their relations. The availability of time-series gene expression data makes it possible to investigate the gene activities of the whole genomes. Under this framework, gene interaction is explained through a connection weight matrix. Based on the fact that the measured time points are limited and the assumption that the genetic networks are usually sparsely connected, we present an IWO-ABC-based search algorithm to unveil potential genetic network constructions that fit well with the time-series data and explore possible gene interactions.
Article
With the increased availability of DNA microarray time-series data, it is possible to discover dynamic gene regulatory networks (GRNs). S-system is a promising model to capture the rich dynamics of GRNs. However, owing to the complexity of the inference problem and limited number of available data comparing to the number of unknown kinetic parameters, S-system can only be applied to a very small GRN with few parameters. This significantly limits its applications. A unified approach to infer GRNs using the S-system model is proposed. In order to discover the structure of large-scale GRNs, a simplified S-system model is proposed that enables fast parameter estimation to determine the major gene interactions. If a detailed S-system model is desirable for a subset of genes, a two-step method is proposed where the range of the parameters will be determined first using genetic programming and recursive least square estimation. Then the mean values of the parameters will be estimated using a multi-dimensional optimisation algorithm. Both the downhill simplex algorithm and modified Powell algorithm are tested for multi-dimensional optimisation. A 50-dimensional synthetic model with 51 parameters for each gene is tested for the applicability of the simplified S-system model. In addition, real measurement data pertaining to yeast protein synthesis are used to demonstrate the effectiveness of the proposed two-step method to identify the detailed interactions among genes in small GRNs.