Conference PaperPDF Available

Neural Network Based Gene Regulatory Network Reconstruction

February 2015

February 2015

DOI:10.1109/C3IT.2015.7060112

Conference: Third International Conference on Computer, Communication, Control and Information Technology (C3IT), 2015
At: AOT, Hooghly

Authors:

Sudip Mandal

Jalpaiguri Government Engineering College

Goutam Saha

Bidhan Chandra Krishi Viswavidyalaya

Raj Kumar Pal

Punjab Agricultural University

Gene Regulatory Networks (GRN) is used to model the regulations in living organisms. Inferring genetic network from different experimental high throughput biological data (like microarray) is a challenging job for all researchers. In this paper, Artificial Neural Network, which is a very effective soft computing tool to learn and model the dynamics or dependencies between genes, is used for reconstruction of small scale GRN from the reduced microarray dataset of Lung Adenocarcinoma. The significance of regulations of one gene to other genes of the system are expressed by a weight matrix which is computed using Perceptron based biologically significant weight updating method by minimizing the error during learning. Based on the values of elements of filtered weight matrix, a directed weighted graph can be drawn successfully that denotes gene regulatory network.

…

Figures - uploaded by Sudip Mandal

Content may be subject to copyright.

Content uploaded by Sudip Mandal

Content may be subject to copyright.

Neural Network Based Gene Regulatory Network

Reconstruction

Sudip Mandal1, Goutam Saha2 and Rajat Kumar Pal3

1ECE Department, GIMT, Krishna Nagar, India, email: sudip.mandal007@gmail.com

2IT Department, NEHU, Shillong, India, email: dr_goutamsaha@yahoo.com

3CSE Department, University of Calcutta, Kolkata, India, email: pal.rajatk@gmail.com

Abstract—Gene Regulatory Networks (GRN) is used to model

the regulations in living organisms. Inferring genetic network

from different experimental high throughput biological data (like

microarray) is a challenging job for all researchers. In this paper,

Artificial Neural Network, which is a very effective soft

computing tool to learn and model the dynamics or dependencies

between genes, is used for reconstruction of small scale GRN

from the reduced microarray dataset of Lung Adenocarcinoma.

The significances of regulations of one gene to other genes of the

system are expressed by a weight matrix which is computed using

Perceptron based biologically significant weight updating method

by minimizing the error during learning. Based on the values of

elements of filtered weight matrix, a directed weighted graph can

be drawn successfully that denotes gene regulatory network.

Keywords—Gene Regulatory Network; Microarray data;

Neural Network.

I. INTRODUCTION

Cellular biology is now becoming a rising area for

researchers to carry out their research. Now a days, there is

plenty amount of biological data accessible because of different

advanced experimental technologies like microarray from

online database like National Center for Biotechnology

Information (NCBI) etc. [1]. To analyze and recover

informative knowledge from these data, efficient

computational techniques are required. These methods can help

us in knowing interactions carried out in organisms at genomic

levels through some mathematical model. Such gene to gene

communications are represented in terms of special network

known as Gene Regulatory Network (GRN). It is a graphical

representation in which nodes consist of genes or protein and

edges connecting them show regulatory relationships between

them. It is constructed by observing the behavior of genes and

their impacts on other genes at particular experimental

condition. This behavior is analyzed in terms of measuring the

values of gene expressions with the help Microarray

technology. Depending on the expression level of genes and

their interaction, the status of human organs can either be

normal or cancerous.

When Transcription Factors (specialized proteins) bind to

promoter region of DNA, it can modify the rate of protein

synthesis and that can results sudden changes in values of gene

expressions. Now, if the rate of protein synthesis decreases

then it is called as inhibition or down regulation or negative

regulation and if it increases then it is called as activation or

up-regulation or positive regulation. If gene g1 positively

regulates gene g2 then it mean there is increase in expression

level of g2 because of g1 and in negative regulation it decreases

expression level of g2. GRN is very helpful for analyzing the

effect of drugs on genes, finding interaction between genes and

genetic pathway for disease development. It also helps in

studying dynamics of specific gene under particular diseased or

experimental circumstances. It helps in studying diseases (in

this case Lungs Adenocarcinoma) caused due to genes.

Many types of models have been proposed for

reconstructing gene regulatory networks in biological systems,

including Boolean networks [2-3], linear weighting networks

[4], differential equations [5], and Bayesian Networks [6-8], S-

system [9-10], Fuzzy Set [11-12], Artificial & Recurrent

Neural Network [13] and Evolutionary Method [14-18] etc.

Boolean networks examine binary state transition matrices to

search patterns in gene expression. Every part of the network is

either on or off depending on whether a signal exceeds a pre-

determined threshold level. Generalized Logical Networks

permit the variables of Boolean networks to have more than

two values and utilize generalized Boolean functions to define

their relationship. Probabilistic Boolean Networks merge

several promising Boolean functions together, so that each one

makes a contribution to the prediction of a target gene. On the

other hand probabilistic model randomly choose one of these

promising predictors. A Linear weighting network has the

benefit of simplicity since they use simple weight matrices to

additively recombine the contributions of different regulatory

elements. A Bayesian networks probabilistic transitions

between network states assuming that there are no cycles in the

network. However cycles or loops are the most important

mechanism to ensure stability. A Dynamic Bayesian Networks

merge the features of Hidden Markov Models to include

feedback. When modeling GRNs with the S-System method,

the expression rates are defined by the difference of two

products of power-law functions, where the first denotes the

activation term and the second the degradation term of a gene

product. Fuzzy work models the interactions between genes in

gene regulatory pathways using fuzzy weights and clustering.

In neural network based model, genes are represented by nodes

input and output layer of neural network where weight matrix

between nodes represents regulatory relationship between

genes. Moreover, different evolutionary hybrid methods like

Recurrent Neural Network, Genetic Algorithm, Particle Swarm

optimization, Ant Colony Optimization and Bee Colony

optimization etc [19, 20] are also already proposed to infer

gene regulatory network. Taking into account of all advantages

of GRN, identifying GRN is still quite complicated and

fascinating task for researchers.

In this paper, Neural Network method is applied for

reconstruction genetic network. This model is used to calculate

the change of expression rates that depends on the weighted

sum of multiple regulatory genes expression levels which are

prior available for this experiment. A more biologically

significant and improved weight updating formula is used for

modeling of Gene Regulatory Network using Perceptron

Learning rule to find more biologically plausible regulatory

network. It is assumed that all related changes in expression

value of each gene depend on the expression value of previous

stage and changes of them also. The corresponding weight

matrix denotes the level of regulation between genes. Based on

this weight matrix, a directed weighted graph can be drawn

which denotes the gene regulatory network. The proposed

model is discussed in the next section. The experimental GRN

& results for the microarray dataset of Lung Adenocarcinoma

are shown in Section-III. Conclusion is given Section IV

followed by references.

II. NEURAL NETWORK MODEL FOR GRN

Neural Network (NN) is a very effective soft computing

tool to learn the pattern from the raw input data similar to the

working principle of Neuron in nervous system of human body.

This model is biologically plausible and noise-resistant. It is

continuous in time and uses a transfer function to transfer the

inputs into a shape similar to that observed in natural processes.

In addition, its nonlinear characteristics provide information

about the principles of control, as well as about the natural

interactions of elements of the modeled system.

Let’s assume that G1, G2, ……., G(m-1) and G

m are m

number of specific responsible genes for the cancer and x1N ,

x2N , ………., x(m-1)N , xmN are the corresponding gene expression

values of those genes of normal cell (here we consider Lung

Adenocarcinoma as the test case). When the lungs is affected

by cancer (during Lung Adenocarcinoma), the gene expression

levels of those genes will be changed as x1C , x2C , ………., x(m-

1)C , xmC. Now we define two 1×m dimensional matrixes to

define expression values of genes corresponding to normal and

cancer which can be found from microarray dataset of Lung

Adenocarcinoma.

[XN]1×m = x1N , x2N , ………., x(m-1)N , xmN (1)

[Xc]1×m = x1C , x2C , ………., x(m-1)C , xmC (2)

Let [dXC]1×m is a matrix which denote the relative change in

the expression values due to cancer.

[dXC]1×m = (x1C - x1N) , (x2C – x2N) , ………., (x(m-1)C - x(m-1)N) ,

(xmC - xmN) (3)

For the reconstruction Gene Regulatory Network, we have

implemented Neural Network model based on Perceptron

learning algorithm which has one input and one output layer.

There is no hidden layer in this structure. In this context, we

have assumed that due to cancer the relative change of

expression value of each gene is a function of expression

values previous normal stages of all genes except its own. Here

overall problem of network finding is subdivided into smaller

one where in spite of finding overall connectivity at a same

time; we discover the interactions of each individual gene

separately. After that, the actual structure of the network is to

be formed in conjunctive way. So we can write the change of i-

th gene in the following way

dxi,C= f(x1N , x2N , ………., x(m-1)N , xmN) where i = 1 to m

and i ≠ m (4)

In GRN problem, we have to infer the optimum level of

regulation between genes in terms of edge connectivity of

nodes (genes) in the network. Now the strength of regulation of

genes can be measured by a constant value called ‘weight’. In

our problem of genetic network reconstruction, we have to find

out the optimum values of weights between each pair of genes

such that maximum classification of accuracy is achieved and

that denotes the optimum weight matrix. For m number of

Genes, weight matrix [W]m×m can be defined in the following

way

[W]m×m = w1,1 w

1,2 …… w1,m

w2,1 w

2,2 …… w2,m

w(m-1),1 w

(m-1),,2 …… w(m-1),m

wm,1 w

m,2 …… wm,m (5)

where wi,j denotes weight corresponding to the regulation of

i-th Gene to the j-th Gene.

To achieve this optimum weight vector, we have used

Perceptron Model of Neural Network and calculate the weights

of edges between input and output layer. In this circumstance,

we consider [XN]1×m as input layer , [dXC]1×m as output layer

and [W]m×m as the weight matrix between them. According to

the fundamentals of Artificial Neural Network (considering the

zero as bias input and no activation function at output), we can

express the relationship between input and output layer by the

following equation

[dXC]= [XN]. [W] (6)

Now, wj,i is the element of the weight vector which is a

function of the expression level at normal stage of j-th gene and

change of the of j-th gene due to disease with some initial

constant K.

j,i = f(xiN , dxi)+ K (7)

Now during learning process i.e. determination of optimal

weight matrix of NN, the network is initialized by an initial

weight matrix with all elements are 1. Then calculate the

learned value of the expression values of i-th cancerous Gene

using the following formula

,

∑,, where



    (8)

Next calculate the error in the learning process for i-th

Gene is calculated with the help of

  ,

,

,

 (9)

Based on the Perceptron Learning rule, weight between i-th

and j-th gene is updated according to following formulas

 0  

, 

,  

      (10)

   0  

, 

,  

      (11)

where α is the learning rate of the Neural network. This is

an iterative process and it continues until the errors for all set

of genes are minimized (zero or very small value) or certain

stopping criterion is achieved (maximum no of iteration).

In the above model, we include  and  in Perceptron

based weight update formula as regulation of each gene depend

on the previous normal stage expression values as well change

of them except itself. These assumptions have particular

biological significance because a gene which has higher

expression value in the normal stage can be considered as more

active genes than the others. Therefore, strength of regulation

or weight should depend on  proportionally. Moreover, a

gene whose expression level does not change significantly

during cancer can be treated as the less active gene which has

less importance in changing expression value of other genes

though it may have high expression value in initial normal

stage. Therefore weight should be updated according to the

values of change  as well as initial value  proportionally.

So this proposed model is more biologically plausible which

denotes that change of a gene expression values depend on the

change as well as initial normal value of others genes.

III. EXPERIMENTAL RESULT

In this research work, we have implemented our proposed

method on the microarray dataset of the Lung Adenocarcinoma

(GEO Accession No.: GSE 10072 obtained from NCBI

website) which have 22284 genes. Finding the regulatory

network between these large numbers of genes is almost

impossible and the regulations are hard to interpret. Therefore

in spite of developing huge complex network, we should try to

reconstruct most biologically significant network for small

number of most responsible genes for Adenocarcinoma and

that will denote the major regulations during cancer. Now

dimensionality reduction of dataset without losing important

information of it can be done by the use Rough Set Theory. In

earlier study [21,22], Rule Reduction process using Rough Set

is successfully applied to find out only 15 responsible genes

from the huge database. So, we have prior knowledge about the

dataset related to the expression value of the responsible genes.

We take the average value of the expression value of each gene

for each individual stage. We above implemented the proposed

model using Matlab7.6 to the following reduced dataset. It is

interesting to found that gene expression values of the

maximum genes are decreasing due to cancer.

TABLE I. DATASET USED FOR LEARNING OF NEURAL NETWORK

Sl. No. Gene ID

Average Value

of Normal

Lungs

Average Value

of Cancerous

Lungs

Gene1 201591_s_at 10.00 9.59

Gene2 201772_at 7.18 8.59

Gene3 201938_at 10.00 10.3

Gene4 202295_s_a 11.45 10.43

Gene5 203065_s_at 11.40 8.98

Gene6 203091_at 7.98 8.59

Gene7 203249_at 8.93 8.26

Gene8 205261_at 10.60 8.30

Gene9 206068_s_at 7.20 5.38

Gene10 208056_s_at 6.88 6.00

Gene11 209072_a 6.45 6.24

Gene12 209613_s_at 10.00 6.12

Gene13 218918_at 8.08 7.38

Gene14 222313_at 6.05 6.24

Gene15 49452_a 7.10 6.07

So, here [XN]1×15 = [10, 7.18, 10, 11.45, 11.40,

7.98, 8.93, 10.6, 7.2,6.88,6.45,10,8.08,6.05,7.10]

[Xc]1×15 = [9.59, 8.59, 10.3, 10.43, 8.98, 8.59, 8.26,

8.30, 5.38, 6, 6.24, 6.12, 7.38, 6.24, 6.07]

Here we consider the following parameters for NN,

α=0.0001; Max no. of iteration for weight update =1000;

minimum error = 0.001. The program is executed for all 15

genes separately until some stopping criterion (minimum error

/ maximum iteration) is achieved. After each run for different

genes, a set of weight vector is achieved and that denote the

amount regulation between each pair of genes. Further we get

following weight matrix whose each element denotes the

weight between corresponding two genes.

TABLE II. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(1-8)

Gene

1 2 3 4 5 6 7 8

1 0 2 1 0 -2 1 0 -1

2 3 0 0 7 15 -1 3 13

3 3 -4 0 7 15 -1 4 13

4 1 2 1 0 -2 1 0 -2

5 0 2 1 0 0 2 0 -2

6 3 -4 0 7 15 0 3 13

7 1 2 1 0 -2 1 0 -1

8 1 2 1 0 -2 2 0 0

9 1 2 1 0 -2 1 0 -2

10 1 2 1 0 -2 1 0 -1

11 1 2 1 0 -2 1 0 -1

12 0 2 1 1 -2 2 0 -2

13 1 2 1 0 -2 1 0 -1

14 3 -4 0 7 15 -1 3 13

15 1 2 1 0 -2 1 0 -1

TABLE III. WEIGHT MATRIX (ROUND OFF VALUE) FOR GENES(9-15)

Gene

9 10 11 12 13 14 15

1 0 0 1 -2 0 1 0

2 8 4 2 20 3 0 5

3 8 4 2 20 3 0 5

4 0 0 1 -3 0 1 0

5 0 0 1 -3 0 1 0

6 8 4 2 20 3 0 5

7 0 0 1 -2 0 1 0

8 0 0 1 -3 0 1 0

9 0 0 1 -3 0 1 0

10 0 0 1 -3 0 1 0

11 0 0 0 -3 0 1 0

12 0 0 1 0 0 1 0

13 0 0 1 -3 0 1 0

14 8 4 2 20 3 0 5

15 0 0 1 -2 0 1 0

The positive value denotes the activation, negative value

denotes inhibition and zero value denotes no regulation

between genes. From above table it can be observed that there

are large number of regulation exist between the different

genes. So, the network will be very complex in nature.

Moreover, we are interested to find out large regulatory effect

of genes. Therefore, the weight matrix is filtered up to a certain

threshold vale of weight (here threshold is 4) i.e. if the absolute

value of any element of weight matrix is greater than 4 than it

remains in the matrix else the value is replaced with zero.

Now based on this filtered weight matrix, a directed

weighted graph can be drawn easily where genes are denoted

by the nodes and the regulations are denoted by edges. If there

is a value of weight between two genes in the matrix, there will

be an edge between them else there will no edge or regulation.

It must be noted that if you decrease the threshold level,

network will be more complex and hard to interpret.

Fig. 1. GRN using proposed Neural Network

Above figure shows the desired small scale gene regulatory

network for the Lung Adenocarcinoma. The directed edges

confirm the regulations among the genes during cancer. It can

be clearly observed that Gene1, Gene7, Gene11 and Gene13 do

not take in any kind of regulation. Others genes regulate

(exhibit or inhibit) each other based on the value of the weight

between them.

Now the pseudo code of the proposed algorithm can be

given as

1. Input [XN]1×15 and [XC]1×15

2. Calculate [dxc]1×15

3. Initialize [W]10×15 with all one

4. Set α=0.0001

5. Loop i=1 to 15 do

6. Loop j= 1 to 15 do

7. Calculate initial learned output

,

∑,, 



   

8. Calculate error   ,

,

,



9. Loop i=1…..1000 do

10. if   0.001 

11. 

, 

,  

     

12. else if   0

13. 

, 

,  

     

14. else

15. break

16. end if

17. Calculate learned output

,

∑,, 



 

18. Calculate error using  ,

,

,



19. end loop

20. if absolute value of each element ≥ 4 then

21. wi,j = wi,j

22. else

23. wi,j =0

24. end if

25. end loop

26. end loop

27. show the filtered weight matrix

28. draw the Directed Graph

IV. CONCLUSION

So, Neural Network can be successfully applied to all

aspects of gene regulatory network analysis from classification

to assessing network credibility. Here, input and output layer of

the network are the expression value of normal Lungs and

change of the responsible genes for a Lung Adenocarcinoma.

We have use a biologically relevant modified weight updating

formula for Perceptron learning algorithm of Neural Network

to obtain more biological plausible genetic network from

reduced microarray dataset of the disease. Base on the weight

matrix which has a direct meaning in terms of influence of

genes to others, a directed graph i.e. GRN can be drawn and

inferred successfully. So, during drug design we have to focus

on these regulating genes. The results can be improved by

applying several optimization algorithms. Though result is not

validated into wet lab, we hope it has great impact in

Computational Biology & Biomedical Science.

REFERENCES

[1] National Center for Biotechnology Information (NCBI), "Microarrays:

Chipping Away at the Mysteries of Science and Medicine," A Science

primer [online], 2007.

[2] S. Liang, S. Fuhrman, and R. Somogyi, "REVEAL, A general reverse

engineering algorithm for inference of genetic network

architectures,"Pacific Symposium on Biocomputing 3, pp. 18-29,1998.

[3] T. Akutsu, S. Miyano, and S. Kuhara, "Identification of Genetic

Networks from a Small Number of Gene Expression Patterns under the

Boolean Network Model," Pacific Symposium on Biocomputing 4, pp.

17-28, 1999.

[4] D. C. Weaver, C. T. Workman, and G. D. Stormo, "Modeling

Regulatory Networks with Weight Matrices," Pacific Symposium on

Biocomputing 4, pp.123, 1999.

[5] T. Akutsu, S. Miyano, and S. Kuhara, "Algorithms for Inferring

Qualitative Models of Biological Networks," Pacific Symposium on

Biocomputing 5, pp. 293-304, 2000.

[6] K. Murphy and S. Mian, "Modelling Gene Expression Data using

Dynamic Bayesian Networks," Computer Science Division, University

of California, Berkeley 1999.

[7] K. Murphy, "Dynamic Bayesian Networks: Representation, Inference

and Learning," in Computer Science: University of California, pp. 255,

2002.

[8] B. E. Perrin, L. Ralaivola, A. Mazurie, S. Bottani, J. Mallet, and F.

D'Alche-Buc, "Gene networks inference using dynamic Bayesian

networks," Bioinformatics, vol. 19, Suppl 2, pp. II138-II148, 2003.

[9] H. Wang, L Quin and E. Dougherty, “Inference of Gne Regulatory

Network using S- System: A unified Approach”, Proceeding of 2007

IEEE symposiam CIBCB, pp. 82-89, 2007.

[10] T. Nakayama, S. Seno, Y Takenaka and H. Matsuda, “ Inference of

Gene Regulatory Networks using Immune Algorithm”, Journal of

Bioinformatics and Computational Biology, Vol. 9, pp. 75-86, 2011.

[11] P. Du, J. Gong, E.S. Wurtele, and J.A. Dickerson, “Modeling gene

expression networks using fuzzy logic,” IEEE Transactions on Systems,

Man and Cybernetics, vol. 35, pp. 1351-1359. 2005.

[12] J. A. Dickerson, Z. Cox, E. S. Wurtele, and A. W. Fulmer, "Creating

Metabolic and Regulatory Network Models using Fuzzy Cognitive

Maps," North American Fuzzy Information Processing Conference

(NAFIPS), Vol.4, pp. 2171-2176, 2001.

[13] J. Vohradsky, “Neural network model of gene expression”, The FASEB

Journal, Vol. 15, pp.846-854, 2001.

[14] A.M. Ioannis, D. Andrei and T. Dimitris, “Gene regulatory networks

modeling using a dynamic evolutionary hybrid,” BMC Bioinformatics,

Vol. 11:140, pp.1-17, 2010.

[15] E. Keedwell and A. Narayanan, “Discovering Gene Networks with a

Neural-Genetic Hybrid”, IEEE/ACM Transaction in Computational

Biology and Bioinformatics, Vol. 2, No. 3, pp. 231-242, 2005.

[16] K. Kentzoglanankis and M. Poole, “A Swarm Intelligence Framework

for reconstructing Gene Networks: Searching for Biologically Plausible

Architecture”, IEEE/ACM Transaction in Computational Biology and

Bioinformatics, Vol. 9, No. 2, pp.355-371, 2012.

[17] R. Xu, D. C. Wunsch II and R.L. Frank, “ Inference of Genetic

Regulatory Networks with Recurrent Neural Network Models using

particle Swarm Optimization,”IEEE/ACM Transaction in Computational

Biology and Bioinformatics, Vol. 4, No. 4, pp.681-692, 2007.

[18] P. Rakshit, P. Das, A. Konar, M. Nasipuri and R Janarthan, “ A recurrent

Fuzzy Neural Network model of a Gene Regulatory for Knowledge

Extraction Using Invasive Weed and Artificial Bee Colony Optimization

Algorithm ”, Conference Proceeding of 1st International Conference on

Recent Advances in Information Technology (RAIT-2012), 2012.

[19] N. Vijesh, S. K. Chakrabarty and J. Sreekumar, “Modelling of gene

regulatory network: A review,” Journal of Biomedical Science and

Engineering, Vol. 6, pp. 223-231, 2013.

[20] S. Mandal, G. Saha and R. K. Pal, “Comparative study on Disease

Classification using Different Soft Computing Techniques,” The SIJ

Transactions on Computer Science Engineering & its Applications

(CSEA), Vol. 2, No. 3, 2014.

[21] S. Mandal and G. Saha, “Rough Set Theory Based Automated Disease

Diagnosis using Lung Adenocarcinoma as Test Case”, The SIJ

Transactions on Computer Science Engineering & its Applications

(CSEA), Vol. 1, No. 3, 2013.

[22] S. Mandal , G. Saha and R.K. Pal, “Reconstruction of Dominant Gene

Regulatory Network from Microarray Data Using Rough Set and

Bayesian Approach,” Journal of Computer Science & Systems Biology,

Volume 6, Issue:5 , pp.262-270, 2013.

Revealing the Impact of Genomic Alterations on Cancer Cell Signaling with an Interpretable Deep Learning Model

Article

Full-text available

Jul 2023

Simple Summary Cancer results from aberrant cellular signaling caused by somatic genomic alterations (SGAs). However, inferring how SGAs cause aberrations in cellular signaling and lead to cancer remains challenging. We designed an interpretable deep learning model to encode the impact of SGAs on cellular signaling systems (represented by hidden nodes in the model) and eventually on tumor gene expression. The transparent deep learning architecture enabled the model to discover drivers affecting common signaling pathways and partially resolve the causal structure of signaling proteins. This is an early attempt to use transparent deep learning model, in contrast to conventional "black box" approach, to learn interpretable insights into cancer cell signaling systems. A better representation of signaling system of a cancer cell sheds light on the disease mechanisms of the cancer and can guide precision medicine. Abstract Cancer is a disease of aberrant cellular signaling resulting from somatic genomic alterations (SGAs). Heterogeneous SGA events in tumors lead to tumor-specific signaling system aberrations. We interpret the cancer signaling system as a causal graphical model, where SGAs affect signaling proteins, propagate their effects through signal transduction, and ultimately change gene expression. To represent such a system, we developed a deep learning model called redundant-input neural network (RINN) with a transparent redundant-input architecture. Our findings demonstrate that by utilizing SGAs as inputs, the RINN can encode their impact on the signaling system and predict gene expression accurately when measured as the area under ROC curves. Moreover, the RINN can discover the shared functional impact (similar embeddings) of SGAs that perturb a common signaling pathway (e.g., PI3K, Nrf2, and TGF). Furthermore, the RINN exhibits the ability to discover known relationships in cellular signaling systems.

Revealing the impact of genomic alterations on cancer cell signaling with a partially transparent deep learning model

Preprint

May 2020

Cancer is a disease of aberrant cellular signaling and tumor-specific aberrations in signaling systems determine the aggressiveness of a cancer and response to therapy. Identifying such abnormal signaling pathways causing a patient’s cancer would enable more patient-specific and effective treatments. We interpret the cellular signaling system as a causal graphical model, where it is known that genomic alterations cause changes in the functions of signaling proteins, and the propagation of signals among proteins eventually leads to changed gene expression. To represent such a system, we developed a deep learning model, referred to as a redundant input neural network (RINN), with a redundant input architecture and an L 1 regularized objective function to find causal relationships between input, latent, and output variables—when it is known a priori that input variables cause output variables. We hypothesize that training RINN on cancer omics data will enable us to map the functional impacts of genomic alterations to latent variables in a deep learning model, allowing us to discover the hierarchical causal relationships between variables perturbed by different genomic alterations. Importantly, the direct connections between all input and all latent variables in RINN make the latent variables partially interpretable, as they can be easily mapped to input space. We show that gene expression can be predicted from genomic alterations with reasonable accuracy when measured as the area under ROC curves (AUROCs). We also show that RINN is able to discover the shared functional impact of genomic alterations that perturb a common cancer signaling pathway, especially relationships in the PI3K, Nrf2, and TGFβ pathways, including some causal relationships. However, despite high regularization, the learned causal relationships were somewhat too dense to be easily and directly interpretable as causal graphs. We suggest promising future directions for RINN, including differential regularization, autoencoder pretrained representations, and constrained evolutionary strategies. Author summary A modified deep learning model (RINN with L 1 regularization) can be used to capture cancer signaling pathway relationships within its hidden variables and weights. We found that genomic alterations impacting the same known cancer pathway had interactions with a similar set of RINN latent variables. Having genomic alterations (input variables) directly connected to all latent variables in the RINN model allowed us to label the latent variables with a set of genomic alterations, making the latent variables partially interpretable. With this labeling, we were able to visualize RINNs as causal graphs and capture at least some of the causal relationships in known cancer signaling pathways. However, the graphs learned by RINN were somewhat too dense (despite large amounts of regularization) to compare directly to known cancer signaling pathways. We also found that differential expression can be predicted from genomic alterations by a RINN with reasonably high AUROCs, especially considering the very high dimensionality of the prediction task relative to the number of input variables and instances in the dataset. These are encouraging results for the future of deep learning models trained on cancer genomic data.

A Novel Fuzzy Logic Based Reverse Engineering of Gene Regulatory Network

Article

Full-text available

Jul 2017

Genes of an organism play a very crucial role in the working of various cellular activities. Genes and other biological molecules like DNA, RNA do not operate alone but they all are correlated. Their relationships are shown with the help of networks commonly known as Gene Regulatory Networks. Gene Regulatory Networks are complex control networks that show the map of interactions among the genes. They provide very useful contribution to the genomic science and increase the understanding of various biological processes. In this paper, fuzzy logic based method is proposed for the reverse engineering of gene regulatory network from microarray gene expression datasets. Pre-processing steps have been introduced to increase the efficiency of the method. Clustering technique is also employed to divide the problem into sub problems to reduce the computational complexity at some extent. Finally, the proposed method is tested on two different time course gene expression datasets of yeast having GEO accession number GDS37 and GDS3030. The results are validated by using Specificity, Sensitivity and F-score as parameters. Results of the proposed method are further compared with other existing method which was proposed by Al-Shobaili in 2014.

Inference of Gene Regulatory Networks with Neural-Cuckoo Hybrid

Conference Paper

May 2015

Current progresses in cellular biology and bioinformatics (namely, DNA microarrays) allow researchers to get a distinct picture of the complex biochemical process that occurred within a cell of human body. Therefore, this technology opened up a new door to the researchers of computer science as well as to biologists. The data generated by these microarray experiments are very high dimensional and noisy in nature. One of the greatest challenges of the post genomic era is to investigate and inference of the regulatory interactions or dependencies between genes-genes or genes-proteins from the microarray data. Here, a new methodology has been devel-oped for investigating genetic interaction among genes from temporal gene expression data by combining the features of Neural Network and Cuckoo Search optimization technique. This hybrid technique has been applied on the real-world microarray dataset of Lung Adenocarcinoma. NN-CS algorithm has been used to model genetic network by searching the best combination of regulatory genes that can affect a particular gene most. The derived results confirmed that the proposed approach can able to infer small scale genetic networks that fit most with the training data. It is believed that the proposed algorithm for finding gene regulatory network has great potential in medical science.

A Comparative Study on Disease Classification using Different Soft Computing Techniques

Article

Full-text available

May 2014

Sudip Mandal

Number of Patients with cancer, heart disease & Diabetes are increasing day by day because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, drugs and smoking etc. A range of therapies have been provided by researchers already. Early diagnosis is of considerable significance of the physician's skills conducted based on their knowledge and experience yet an error might occur. Using various Artificial Intelligence methods for medical diagnosis of diseases has recently become widespread. These intelligent systems help physicians as a diagnosis assistant. Now, various Artificial Neural Network, Rough Set, Decision Tree, Bayesian Network are very popular for this purpose. This paper provides a review of different soft computing method in diagnosis and detection of above mentioned disorders acuteness. The survey is carried out for three different types of data of different diseases with cross validation and percentage split for testing new data sets of each. The results indicates that Rough Set Theory gives maximum accuracy and coverage area but with maximum computational time complexity. On the other hand Neural and Bayesian Network give quite satisfactory results. Moreover the obtained results also suggest that accuracy depends on the quality of normalization of data.

Modeling of gene regulatory networks: A review

Article

Full-text available

Jan 2013

Nedumparambathmarath Vijesh

Gene regulatory networks play an important role the molecular mechanism underlying biological processes. Modeling of these networks is an important challenge to be addressed in the post genomic era. Several methods have been proposed for estimating gene networks from gene expression data. Computational methods for development of network models and analysis of their functionality have proved to be valuable tools in bioinformatics applications. In this paper we tried to review the different methods for reconstructing gene regulatory networks.

Reconstruction of Dominant Gene Regulatory Network from Microarray Data Using Rough Set and Bayesian Approach

Article

Full-text available

Oct 2013

Sudip Mandal

Biological databases, containing genetic information of patients, are undergoing tremendous growth beyond our analysing capability. However such analysis can reveal new findings about the cause and subsequent treatment of any disease. Interactions between genes and the proteins they synthesize shape Genetic Regulatory Networks (GRN). In this context, it has been developed a model capable of representing small dominant GRN, combining characteristics from the Rough Set and Bayesian Network. The investigation has been carried out on the publicly available microarray dataset for Lung Adenocarcinoma, obtained from the National Center for Biotechnology Information (NCBI) website. The analysis revealed that Rough Set Theory (RST) is able to extract the various dominant genes in term of reducts which play an important role in causing the disease and also able to provide a unique simplified rule set for building expert systems in medical sciences with high accuracy and coverage factor. The next part of this work is based on reconstruction of GRN using Bayesian network, which is a mathematical tool for modelling conditional independences between stochastic variables like different gene expression. This proposed Bayesian approach using scaled mutual information for scoring is applied to the dataset corresponding to most dominant responsible genes for Adenocarcinoma to uncover, gene/protein interactions and key biological features of the cellular system. Finally different interacting regulatory path which are the gene signature for a particular disease, between dominating genes are inferred from the probability distribution table and Bayesian Graph. Such reconstructed regulatory network is attractive for their ability to describe complex stochastic processes like gene transcription, classification of biological sequencing and intuitive model of causal influence successfully. This may serve as a signature pattern of the disease Adenocarcinoma, which has been extracted from huge microarray dataset. Extraction of this signature pattern is very useful for diagnosis of this disease.

Inference of S-system models of gene regulatory networks using immune algorithm

Article

Full-text available

Dec 2011

The S-system model is one of the nonlinear differential equation models of gene regulatory networks, and it can describe various dynamics of the relationships among genes. If we successfully infer rigorous S-system model parameters that describe a target gene regulatory network, we can simulate gene expressions mathematically. However, the problem of finding an optimal S-system model parameter is too complex to be solved analytically. Thus, some heuristic search methods that offer approximate solutions are needed for reducing the computational time. In previous studies, several heuristic search methods such as Genetic Algorithms (GAs) have been applied to the parameter search of the S-system model. However, they have not achieved enough estimation accuracy. One of the conceivable reasons is that the mechanisms to escape local optima. We applied an Immune Algorithm (IA) to search for the S-system parameters. IA is also a heuristic search method, which is inspired by the biological mechanism of acquired immunity. Compared to GA, IA is able to search large solution space, thereby avoiding local optima, and have multiple candidates of the solutions. These features work well for searching the S-system model. Actually, our algorithm showed higher performance than GA for both simulation and real data analyses.

Inference of Gene Regulatory Networks using S-System: A Unified Approach.

Conference Paper

Full-text available

Jan 2007

In this paper, a unified approach to infer gene regulatory networks using the S-system model is proposed. In order to discover the structure of large-scale gene regulatory networks, a simplified S-system model is proposed that enables fast parameter estimation to determine the major gene interactions. If a detailed S-system model is desirable for a subset of genes, a two-step method is proposed where the range of the parameters will be determined first using genetic programming and recursive least square estimation. Then the exact values of the parameters will be calculated using a multi-dimensional optimization algorithm. Both downhill simplex algorithm and modified Powell algorithm are tested for multi-dimensional optimization. Simulation results using both synthetic data and real microarray measurements demonstrate the effectiveness of the proposed methods

A Comparative Study on Disease Classification using Different Soft Computing Techniques

Article

Aug 2014

Rough Set Theory based Automated Disease Diagnosis using Lung Adenocarcinoma as a Test Case

Article

Aug 2013

A recurrent fuzzy neural model of a gene regulatory network for knowledge extraction using invasive weed and artificial bee colony optimization algorithm

Article

Mar 2012

Generating inferences from a gene regulatory network is important to understand the fundamental cellular processes, involving gene functions, and their relations. The availability of time-series gene expression data makes it possible to investigate the gene activities of the whole genomes. Under this framework, gene interaction is explained through a connection weight matrix. Based on the fact that the measured time points are limited and the assumption that the genetic networks are usually sparsely connected, we present an IWO-ABC-based search algorithm to unveil potential genetic network constructions that fit well with the time-series data and explore possible gene interactions.

Dynamic Bayesian Networks: Representation

Article

K. P. Murphy

Inference of Gene Regulatory Networks using S-System: A Unified Approach

Article

Apr 2010

With the increased availability of DNA microarray time-series data, it is possible to discover dynamic gene regulatory networks (GRNs). S-system is a promising model to capture the rich dynamics of GRNs. However, owing to the complexity of the inference problem and limited number of available data comparing to the number of unknown kinetic parameters, S-system can only be applied to a very small GRN with few parameters. This significantly limits its applications. A unified approach to infer GRNs using the S-system model is proposed. In order to discover the structure of large-scale GRNs, a simplified S-system model is proposed that enables fast parameter estimation to determine the major gene interactions. If a detailed S-system model is desirable for a subset of genes, a two-step method is proposed where the range of the parameters will be determined first using genetic programming and recursive least square estimation. Then the mean values of the parameters will be estimated using a multi-dimensional optimisation algorithm. Both the downhill simplex algorithm and modified Powell algorithm are tested for multi-dimensional optimisation. A 50-dimensional synthetic model with 51 parameters for each gene is tested for the applicability of the simplified S-system model. In addition, real measurement data pertaining to yeast protein synthesis are used to demonstrate the effectiveness of the proposed two-step method to identify the detailed interactions among genes in small GRNs.

Neural Network Based Gene Regulatory Network Reconstruction

Abstract and Figures

Recommended publications

Multi-Platform, Multi-Site, Microarray-Based Human Tumor Classification

An optimization approach of the reconstructing gene regulatory networks and simulation based on dyna...

Reverse engineering of gene regulatory networks based on S-systems and Bat algorithm

Coarse-grain reconstruction of genetic networks from expression levels