ArticlePDF AvailableLiterature Review

Tools for protein-protein interaction network analysis in cancer research

Authors:

Abstract and Figures

As cancer is a complex disease, the representation of a malignant cell as a protein-protein interaction network (PPIN) and its subsequent analysis can provide insight into the behaviour of cancer cells and lead to the discovery of new biomarkers. The aim of this review is to help life-science researchers without previous computer programming skills to extract meaningful biological information from such networks, taking advantage of easy-to-use, public bioinformatics tools. It is structured in four parts: the first section describes the pipeline of consecutive steps from network construction to biological hypothesis generation. The second part provides a repository of public, user-friendly tools for network construction, visualisation and analysis. Two different and complementary approaches of network analysis are presented: the topological approach studies the network as a whole by means of structural graph theory, whereas the global approach divides the PPIN into sub-graphs, or modules. In section three, some concepts and tools regarding heterogeneous molecular data integration through a PPIN are described. Finally, the fourth part is an example of how to extract meaningful biological information from a colorectal cancer PPIN using some of the described tools.
Content may be subject to copyright.
Abstract As cancer is a complex disease, the representa-
tion of a malignant cell as a protein-protein interaction
network (PPIN) and its subsequent analysis can provide
insight into the behaviour of cancer cells and lead to the
discovery of new biomarkers. The aim of this review is to
help life-science researchers without previous computer
programming skills to extract meaningful biological in-
formation from such networks, taking advantage of easy-
to-use, public bioinformatics tools. It is structured in four
parts: the fi rst section describes the pipeline of consecutive
steps from network construction to biological hypothesis
generation. The second part provides a repository of public,
user-friendly tools for network construction, visualisation
and analysis. Two different and complementary approaches
of network analysis are presented: the topological approach
studies the network as a whole by means of structural graph
theory, whereas the global approach divides the PPIN into
sub-graphs, or modules. In section three, some concepts
and tools regarding heterogeneous molecular data integra-
tion through a PPIN are described. Finally, the fourth part
is an example of how to extract meaningful biological in-
formation from a colorectal cancer PPIN using some of the
described tools.
Keywords Cancer · Systems biology · Protein-protein
interaction network · Public bioinformatics tools ·
Biomarker discovery
Introduction
Cancer is a complex disease in which many proteins, genes
and molecular processes are implicated [1]. Genes and
proteins do not work independently, but are organised into
co-regulated units that perform a common biological func-
tion. It is the alteration of these functional elements that
leads to the development of a particular cancer phenotype
(i.e., drug response or disease outcome) and, consequently,
their study cannot be tackled from the classical one-gene
approach. A systems biology approach, the analysis of the
molecular relationship between the implicated genes and
proteins as a whole, is required to understand the disease
phenotype [2–4].
In this scenario, cancer systems medicine emerges as a
translational extension of systems biology that meets the
clinical information and the -omics disciplines for the clas-
sifi cation and diagnosis of cancer subtypes, the prognosis
of patient outcomes, the prediction of treatment responses
and the identifi cation of perturbation targets for drug devel-
opment [5, 6].
Proteins interact with each other within a cell, and those
interactions can be represented by a network, defi ned as an
abstract representation of nodes or vertices (i.e., proteins)
R. Sanz-Pamplona · A. Berenguer · X. Sole · D. Cordero ·
M. Crous-Bou · J. Serra-Musach · E. Guinó · M.A. Pujana ·
V. Moreno ()
Unit of Biomarkers and Susceptibility
Catalan Institute of Oncology (ICO)
Bellvitge Institute for Biomedical Research (IDIBELL)
Biomedical Research Centre Network for Epidemiology
and Public Health (CIBERESP)
Av. Gran Vía, 199
ES-08908 L’Hospitalet de Llobregat, Barcelona, Spain
e-mail: v.moreno@iconcologia.net
V. Moreno
Department of Clinical Sciences
Faculty of Medicine
University of Barcelona
Barcelona, Spain
Clin Transl Oncol (2012) 14:3-14
DOI 10.1007/s12094-012-0755-9
EDUCATIONAL SERIES Blue Series
Tools for protein-protein interaction network analysis in cancer research
Rebeca Sanz-Pamplona · Antoni Berenguer · Xavier Sole · David Cordero · Marta Crous-Bou · Jordi Serra-Musach ·
Elisabet Guinó · Miguel Ángel Pujana · Víctor Moreno
Received: 14 July 2011 / Accepted: 20 August 2011
ADVANCES IN TRANSLATIONAL ONCOLOGY
4
Clin Transl Oncol (2012) 14:3-14
where some pairs of nodes are connected by edges repre-
senting interactions [7]. With the recent advances in high-
throughput experimental technologies, increasing numbers
of large-scale biological networks are being defi ned [8,
9]. Network knowledge can give rise to understanding
the biological function and dynamic behaviour of cellular
systems, generating biological hypothesis about putative
biomarkers, therapeutic targets or deregulated pathways in
cancer [10–14].
Cancer-related proteins have a higher ratio of promis-
cuous structural domains, making them more prone to
interact with other proteins. In fact, they have a large num-
ber of interacting proteins and occupy a central position in
the networks [15]. Proteins interacting with cancer-related
proteins have a higher probability of being related with the
cancer process than non-interacting proteins. Hence, the
study of those proteins may be an effi cient way to discover
novel cancer genes and cancer biomarkers [16–18].
Since understanding complex networks representing a
cancer cell is one of the main challenges of today’s biol-
ogy, this review attempts to help life-science researchers
without previous computer programming skills to extract
meaningful biological information from such networks,
taking advantage of easy-to-use, public bioinformatics
tools. Though different types of biological networks exists
such us regulatory networks, signal transduction networks
or metabolic networks, here only protein–protein interac-
tion networks (PPINs) will be covered. In addition, due
to the complexity of directed (networks whose edges have
directional information) and dynamic networks (those
including changes along time), only undirected and static
PPINs will be discussed here [19].
This review is structured into four sections. The fi rst
section briefl y describes the workfl ow enumerating con-
secutive steps from network construction to hypothesis
generation. The second section details suitable tools to
carry out each step of the analysis. Some concepts about
data integration are described in the third section. Finally,
a fourth section presents an illustrative example of how to
use these tools, using colorectal cancer data. This is not a
compendium of all existing network-management tools but
a tutorial to construct and analyze protein interaction net-
works in a simple manner. It should be noted, however, that
a unique method does not exist and each particular network
may have characteristics requiring specifi c software. For
example, software suitable for dealing with huge graphs
may not be helpful for analysing small networks, and vice
versa. Also, although graph theory is beyond the aim of
this review, basic ideas to start dealing with interaction net-
works will be provided and the references would be helpful
for a more in-depth study of this topic.
Work-fl ow: from network assembly to hypothesis
generation
Figure 1 summarises five sequential steps required to
generate a biological hypothesis on cancer cell behaviour
through PPIN construction and analysis.
The starting point is to decide which proteins defi ne
the input, hereafter seed proteins (fi rst step). These should
be the molecules of major interest and will be the skeleton
of the PPIN. Typical choices are differentially expressed
Fig. 1 Pipeline. The process of PPIN con-
struction and analysis follows these con-
secutive steps. First, a list of molecules of
interest (seed proteins) is defined. Next,
their interactions are searched in a specia-
lised database and represented in a PPIN.
Then, the network is analysed and conse-
quently a biological hypothesis is generated.
In this diagram (steps 1–3), six seed pro-
teins (1–6) are represented in red whereas
their interacting proteins are represented in
green (“a” and “b”). Protein “a” interacts
with seeds “1”, “2” and “6”; protein “b”
interacts with seeds “3” and “4”; and seeds
“1” and “6” interact with each other. Seed
“5” has no interacting partners. Step 4 of the
chart shows two complementary network
analysis approaches: rst, the topological
methods, which look for essential nodes
into the architecture of the network. In this
example interacting protein “a” acts as a
hub because of their higher degree. Second,
modular methods divide network into sub-
graphs grouping proteins sharing a common
property. Some of the public tools useful in
each step of the analysis are also represented
in the gure
Clin Transl Oncol (2012) 14:3-14
5
molecules observed in a given experiment (transcriptomic
or proteomic) or molecules known to be involved in cancer.
The fi nal hypothesis derived from the network analysis will
be directly related to these seed proteins.
The second step is the retrieval of binary interactions.
Interacting partners of seed proteins need to be identifi ed
from curated databases. Several publicly available data-
bases exist: HPRD [20], String [21], DIP [22] and others
[23]. A description of the experimental and computational
procedures to obtain these protein-protein interaction data
is beyond the objectives of this review (see Refs. [24] and
[25] for more information).
The third step is network construction and visu-
alisation. From the set of protein–protein interactions, the
construction of a graph consists of assigning vertices to
proteins and edges to interactions between proteins. Then,
several algorithms allow the creation of a visual represen-
tation of the network [26].
The fourth step is network analysis, when meaningful
biological information extraction is done using bioinfor-
matics methods. Two complementary approaches in the
area of network analysis exist: topological (study of the
whole graph) and modular (division of networks into mod-
ules of related proteins) [27].
As a result, derived from network construction and
analysis, a hypothesis generation (step 5) regarding the
initial data is desirable. Ideally, a topological network anal-
ysis usually identifi es proteins susceptible to be biomarkers
or therapeutic targets, whereas a modular approach gives
information about deregulated functions or pathways.
Public tools for network management
Multiple public network management tools exist. An ex-
haustive review published few years ago identifi ed no less
than 35 and the number continues to grow exponentially
[28]. In this review only some of the best known and/or
easier to use will be discussed, but it is strongly recom-
mended to explore other tools, some of which might be
useful for specifi c topics.
Table 1 Construction tools
Kind of interactions Included databases Input Distinctive features Webpage
Experimentally
determined and
predictions
Experimentally
determined and
predictions
Only experimentally
determined
Experimentally
determined and
predictions
String, Intact,
DIP, Degg, IPI,
SCOP, UniProt,
Reactome, MINT,
cog and psi_mi
DIP, MINT,
BIND, HPRD,
MIPS, CYGD
BioGRID and
NCBI
HPRD, IntAct,
BIND, DIP and
MINT
MDC_Y2H,
CCSB, HPRD,
DIP, BIND,
IntAct, BioGRID,
COCIT, REAC-
TOME, ORTHO,
HOMOMINT
and OPHID
BIANA [31]
Poinet [32]
SNOW [33]
UniHI [34]
Accepts a variety
of identifi ers (Uni-
Prot, Ensembl,
GeneSymbol…)
NCBI or UniProt
identifi ers
Gen, transcript or
protein
Entrez Gene,
GeneSymbol,
UniProt, NCBI,
Ensembl, RefSeq,
BioGrid, HPRD,
OMIM
On-line interface
Flexibility: It is possible to choose the
network level, the relation types, restrict
interactions by method and add interologs
The output could be downloading and
visualised in Cytoscape
By default, only experimentally
determined interactions were retrieved
It is possible to fi lter interactions based
on the number of shared GO terms
between the two interacting proteins
PPI could be fi ltered with tissue-specifi c
expression data from public resources
The output will be directly visualised
with POINET or be downloaded and
visualised in Cytoscape
Construct minimal connected network
(MCN); a graph containing only seed
and linker proteins
Maps seed proteins onto an interactome
of reference calculating the network
parameters degree, clustering coeffi cient
and betweenness
Uses Human Gene Atlas data to construct
tissue-specifi c interaction networks
Annotate networks with pathway
information from KEGG database
Only accept a maximum of 50 proteins
as an input
http://sbi.imim.es/web/BIANA.
php
http://poinet.bioinformatics.tw
http://snow.bioinfo.cipf.es/cgi-
bin/snow.cgi
http://theoderich.fb3.mdc-
berlin.de:8080/unihi/home.jsp
6
Clin Transl Oncol (2012) 14:3-14
Network management software typically specialises in
construction, visualisation or analysis steps. However, this
is an artifi cial classifi cation and overlapping is common:
some tools are useful for several or all steps.
Construction tools
Once the list of seed proteins is ready, the fi rst step con-
sists in joining them together through linkers, proteins that
bind with two or more seed proteins working as bridges.
The number of linker proteins inserted between two seed
proteins determines the network distance or network level
[29]. A distance one is recommended since distance two
usually retrieves an undesirable “ball of yarn” network.
Moreover, at this point of the analysis it is crucial to de-
cide the nature of the interactions that will be included in
the analysis: experimentally and/or computationally de-
termined. Literature-based interactions are more reliable
but biased towards networks of the better studied proteins
and less likely to discover new interesting interactions.
Computational-inferred interactions from high-throughput
experiments do not have this bias, but result in a higher
rate of false interactions being included, so a more careful
interpretation of the data and a subsequent experimental
validation are desirable [30].
Tools exist that look for binary interactions in special-
ised databases and automatically retrieve a PPIN. Some of
the more popular ones are summarised in Table 1.
Visualisation tools
Given a complex system under study, one natural goal is
to create a graphical representation of the system as a net-
work in which nodes represent proteins and edges interac-
tions between proteins [7]. Creating this representation is
not trivial work and sometimes it drives the interpretation
of the system and the hypothesis derived. Diverse layouts
exist to represent a network such as circular, hierarchical or
force-directed [28]. These can be drawn with network visu-
alisation tools (see Ref. [35] for a review). Three of them
have been summarised in Table 2.
Analytical tools
In order to extract underlying biological information from
the PPIN, it is necessary to analyse it using graph-theo-
retic tools. Two different and complementary approaches,
named topological and modular, have been developed for
the study of a complex network. The topological approach
studies the network as a whole by means of the analysis of
the structural parameters of the graph. Instead, the modular
approach divides the PPIN into modules that group nodes
based on a common characteristic such as sharing the same
function or belonging to the same pathway. Afterwards,
each module is studied separately [27].
Topological approaches: centrality measures and network
motifs
The description of the structural characteristics of a net-
work is often the first step in the analysis of network
data [40]. Biological networks including PPIN are usually
scale-free, meaning that a few nodes are highly connected
(“hubs”) and a majority of nodes are linked to only one or
a few neighbours [41, 42]. According to the lethality and
centrality rule, nodes that have a major number of connec-
tions are those that play a more important role in the archi-
tecture of the PPIN and tend to be biologically relevant in
the studied system [43]. In other words, highly connected
proteins are essential to organism viability [44]. It has also
been demonstrated that genes traditionally associated with
cancer are implicated in multiple cellular processes and
Table 2 Visualisation tools
Usage Input Distinctive features Webpage
Ease-to-download and
install Java applica-
tion (Windows, Mac or
UNIX)
The software can be
downloaded or directly
run from the web page
Java application
Table of interactions
(.xls or .txt) Multiple fi le
types (.xml, .rdf, .owl,
.gml, .xgmml, .sif, .sbml)
List of interactions in .txt
format
List of interactions
retrieved from STRING
database
Cytoscape [36, 37]
Arena3D [38]
MEDUSA [39]
The most popular visualisation tool
Allows a variety of graph customisation
Useful to integrate biomolecular networks
into a unifi ed framework
Cytoscape functionality can be expanded
using the collection of plugins developed
by Cytoscape’s community of users
3D view of the network
Is recommendable to use a graphic card
with hardware-accelerated 3D graphics and
at least 256 MB of graphical memory
It was specially designed and optimised for
accessing protein interaction data
from STRING database
http://www.cytoscape.org
http://www.cytoscape.org/
plugins2.php
http://arena3d.org
http://coot.embl.de/medusa
Clin Transl Oncol (2012) 14:3-14
7
signalling pathways, so they often work as protein hubs
inside an interaction network [45].
Identifying essential hubs in the PPIN is a way to
decipher the critical players inside the complex network.
Network centrality measures can be used to rank the nodes
of a given network and find the most important nodes,
hypothetically useful as biomarkers or therapeutic targets
[46]. The identification of central elements in biologi-
cal networks may also provide new hypotheses that lead
to more rational approaches in experimental design [47].
Several centrality measures exist that should be considered
within an exploratory process. The most important ones are
degree, betweenness, closeness and eigenvector centrality.
See Refs. [48] and [49] for a more in-depth explanation of
these concepts.
Network motif distribution is another useful measure. A
motif is a basic building block of complex graphs defi ned
as a sub-network or connectivity pattern that appears in
Table 3 Topological analysis tools
Computed parameters Input Distinctive features Webpage
Degree, bottleneck, edge
percolated component,
subgraph centrality,
maximum neighbourhood
component and density of
maximum neighbourhood
component
Degree, eccentric-
ity, closeness, radiality,
centroid value, stress,
S.P. Betweenness,
C.-F. Closeness, C.-F.
Betweenness, Katz
Status, Eigenvector, Hub-
bell index, Bargaining,
PageRank, HITS-Hubs,
HITS-Authorities and
Closeness-vitality
Number of nodes and
edges, self-loops,
connected components,
average number of neigh-
bours, network diameter,
radius, density, cen-
tralisation, heterogeneity,
clustering coeffi cient,
number of shortest paths
and the characteristic
path length
Motifs
Motifs
List of interactions in .txt
format
Network data in .net,
.tab, .mat or .xml format
Network charged in
Cytoscape environment
List of interactions in .txt
format
Network
Hubba [47]
Centibin [48]
NetworkAnalyzer
[52]
MAVisto [53]
FANMOD [54]
Web-based tool
The appropriate tool to just rank proteins
in a network by centrality measures
Free installable Windows application
Useful for a detailed centrality study be-
cause offers more algorithms than
the other tools
Java plugin for Cytoscape
Displays a comprehensive set of
topological parameters
It is possible to visualise different param-
eters in the same network by changing
node’s features (i.e., “degree” in colour and
“closeness centrality” in size)
Motifs were detected by comparing the
frequency of all occurrences of a motif in
the studied network to the frequency values
of this motif in randomisations of the same
network
MAVisto presents several presentations of
their results: a motif table (with p-value and
z-score), a motif view, a motif fi ngerprint
and a visualisation of motif matches in the
network
Computationally time consuming
Motifs were detected and grouped intomotif
classes. Then, an algorithm determines
which motif classes are displayed at much
higher frequency than in random graphs
Faster than MAVisto
http://hub.iis.sinica.edu.tw/
Hubba
http://centibin.ipk-gatersleben.
de/
http://med.bioinf.mpi-inf.mpg.
de/networkanalyzer/
http://mavisto.ipk-gatersleben.de
http://www.minet.unijena.
de/~wernicke/motifs
8
Clin Transl Oncol (2012) 14:3-14
a PPIN at a signifi cantly higher frequency than would be
expected for a random network [50]. The distribution of
motifs characterises the local structure of networks and has
also been shown to be functionally relevant [51]. Despite
the high complexity involved in the detection of network
motifs, in practice the search can be executed in reasonable
time using available software. Typical motifs that repeat-
edly appear in regulatory networks are autoregulatory or
feed-forward motifs. Tools to calculate topological network
parameters are presented in Table 3.
Modular approach
Based on the idea that biological systems are composed of
modules containing interacting components [55], a way to
achieve a better understanding of a complex network is to
break it down into simpler units called modules. A module
is often understood as a subset of vertices that are densely
connected among one another [56].
Commonly, in addition to closeness between nodes,
functional criteria are used to divide a network into mod-
ules. Similar proteins tend to be connected in molecular
networks, so distinct sets of proteins and their correspond-
ing interactions constitute different blocks underlying
common functions [57]. Therefore, the study of modules
could be equivalent to the study of functional units of the
malignant cell [58]. In Table 4, some modular-based tools
helpful to manage a complex PPIN are presented.
Data integration
Taking into account that cancer is a multi-factorial disease
involving diverse anomalies, the analysis of biological
networks integrating different types of molecular data can
lead to discovery of robust, specifi c and useful biomarkers
Table 4 Modular analysis tools
Computed parameters Input Distinctive features Webpage
Connected components,
neighbourhood modules,
hub-based modules,
cliques and cluster
modules
Turn a network into an
interacting clusters
Clusters
GO terms overrepre-
sentation in biological
networks
Clusters
Modules
List of interactions
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Tab-delimited, GML,
VisML, DOT and
adjancency matrix
format
Network charged in
Cytoscape’s environment
GraphWeb [59]
GenePRO [60]
MCODE [61]
BiNGO [62]
NEAT [63]
NEMO [64]
Performs a functional profi ling of discov-
ered modules based on GO annotations
Ref. [58] provides an accurate description
of algorithms underlying each clustering
method
Break down a network into functional
modules extracting them as independent
sub-networks
Cytoscape plugin
Displays a view of the clusters as individual
but interconnected nodes, maintaining the
whole-network picture
A previous hand-made defi nition of clusters
is necessary
Cytoscape plugin
Detects densely connected regions in a
network
Specifi cally oriented to the discovery
of molecular complexes
A set of nodes must be manually selected
from a network and BiNGO retrieves GO
terms associated to this set of proteins
Test the statistical signifi cance of the
enrichment and control the false
discovery rate
Divides the network into non-overlapping
clusters
Retrieve KEGG or MetaCyc pathways
in which proteins are implicated
Identify network communities based on
the premise that densely connected nodes
correspond to functional modules
http://biit.cs.ut.ee/graphweb/
http://wodaklab.org/genepro/
http://baderlab.org/Software/
MCODE
http://www.psb.ugent.be/cbd/
papers/BiNGO/Home.html
http://rsat.ulb.ac.be/rsat/in-
dex_neat.html
http://baderlab.bme.jhu.edu/
baderlab/index.php/NeMo
Clin Transl Oncol (2012) 14:3-14
9
of disease; and also shed light on the mechanisms and aeti-
ology of the studied tumour [65–68].
The representation of data derived from heterogeneous
sources in a unique network is a way to integrate diverse
and massive data sets. PPINs can integrate diverse mo-
lecular data to get a more complete model of the biological
system (Fig. 2). It has been postulated that proteins with
high connectivity within a network could be very impor-
Fig. 2 Data integration into a network to
obtain a more informative PPIN. Red and
green circles represent seed and linker pro-
teins respectively. Complementary mo-
lecular information: over-expression at the
mRNA level is indicated as a purple circle
and proteins with mutations at DNA are
represented as a half-moon shape, i.e., pro-
teins “a” and “d” are overexpressed and
connected by a protein not deregulated at
mRNA level, but mutated. Some of the pub-
lic tools useful to data integration appear in
the purple box
Table 5 Integration tools
Kind of integrated data Input Distinctive features Webpage
Subcellular location
Expression values
-omics experiments
results: expression
microarrays, aCGH, MS/
MS proteomics, GWAS
data, ChIP-chip experi-
ments, DNA methylation
assays or high-throughput
sequencing
Experimental data
Network charged in
Cytoscape’s environment
and subcellular location
data
Network charged in
Cytoscape’s environment
and expression data
A network and high-
throughput results
A network and a
biochemical dataset
Cerebral [70]
Dynamic Expres-
sion Plugin [37]
EGAN [71]
Vanted [72]
Cytoscape’s plugin
It generates an intuitive view of the network
in which proteins appear separated into
layers according to the context of cell
organelles
Cerebral does not automatically search for
cellular location: this data must be provided
to Cerebral as a Cytoscape attribute
Cytoscape’s plugin
It colours the nodes in a range accord-
ing to their level of expression: from blue
(minimum expression) to red (maximum
expression)
Useful to easily identify down- or up-
regulated areas of the network
An expression data fi le must be charged in
Cytoscape
Java application
It allows combining interaction and molec-
ular data in the context of network modules,
i.e., expression data: divide network into
topological modules (motifs) and then look
for co-expression patterns in each module,
divide network into functional modules
and then look for co-expression patterns,
or use expression information to divide the
network into co-expression modules
EGAN allows selecting nodes based on
crossing between different data: i.e., select
all genes with up-regulated expression and
amplifi ed copy number.
Easy to download and install Java applica-
tion (Windows, Mac or UNIX)
A tool specially designed to help scientists
with the interpretation of related experi-
mental data
http://www.pathogenomics.ca/
cerebral/
http://chianti.ucsd.edu/svn/
csplugins/trunk/ucsf/scooter/
dynamicXpr/
http://akt.ucsf.edu/EGAN/
http://vanted.ipk-gatersleben.de/
10
Clin Transl Oncol (2012) 14:3-14
tant to the studied disease, despite not being differentially
expressed. Thus, genes with a role in tumorigenesis not
detected in a high-throughput experiment could be iden-
tifi ed by a network-based approach. For example, if an
important protein is activated by phosphorylation, its gene
expression may not be altered, but the kinase that phospho-
rylates it will be up-regulated. So, even though no changes
in expression are observed when measuring the protein,
since that protein is connected to its kinase that is altered,
the network will reveal its importance. The same occurs
with mutated genes with a role in tumour progression not
detected by differential expression experiments, but usually
taking up a central position in networks [69].
Usually, a network contains false positive interactions
or interactions that are not working in the studied tissue.
Expression data could be used as a fi lter assuming that if
a gene is not expressed in such tissue, neither will its cor-
responding protein. Consequently, interactions containing
non-expressed genes are not real interactions. Several tools
for diverse data integration into a network are presented in
Table 5.
An example using genes classically related to colorectal
cancer
Figure 3 shows an example of how to use some of the
previously described tools to extract biological informa-
tion from the following 15 colorectal cancer (CRC) genes:
APC, BUB1, MAD2L1, TP53, PI3KCA, EGFR, AURKA,
CTNNB1, SMAD4, WNT1, AXIN2, TGFBR2, MLH1, BRAF
and KRAS. These seed proteins, classical key molecules
driving colon carcinogenesis, are a mix of chromosomal in-
stability (CIN) genes, microsatellite instability (MSI) genes
and CpG island methylation phenotype (CIMP) genes [73].
BIANA software was used to retrieve and export a fi le
containing experimentally determined interactions of the
seed proteins. Next, a visual representation of the resulting
network was performed using Cytoscape software (Fig.
3A). The PPIN showed two components, one called the
giant component, because it contained the higher number
of nodes, and a smaller independent network. The giant
component grouped all seed proteins except WNT and its
interacting partners. APC appeared central, directly inter-
acting with seed proteins AURKA, MAD2L1, CTNNB1,
BUB1 and AXIN2, and indirectly, through linker proteins,
with the remaining seeds except MLH1 (MSI representative
gen). KRAS and BRAF directly interacted with each other
since both are chosen as CIMP-related genes. The protocol
in Fig. 3B was followed to analyse this PPIN including a
topological approach, a clustering or modular approach
and a data integration step.
First, a topological exploration of the PPIN was made:
centrality measures of hub proteins were calculated us-
ing Hubba and NetworkAnalyzer software. Protein ranks
differed slightly depending on the algorithm used for the
analysis, but in all cases AURKA, EGFR and TP53 ap-
peared as the most central proteins in the network, indicat-
ing their biological relevance in the pathogenesis of CRC.
Interestingly, BRAF took up the second position when cen-
trality was measured in terms of maximum neighbourhood
component (MNC) but descended to the fourth position in
degree and sixth in betweenness. This means that though
BRAF does not have many interacting partners and is not
located in all paths crossing the PPIN, when the network
is divided into clusters of densely connected elements, its
appears in more clusters than other proteins such us EGFR
or TP53 (Fig. 3C). A network motif analysis was also done
with MAVisto software, revealing some repeated structures
of the network. Due to the computational requirements of
this complex task, this analysis was done on a small ver-
sion of the network (extracted with POINET software in-
stead of BIANA). As an example, this application revealed
as an important association the interaction between TP53
and the less studied protein RASA1 through the two link-
ers AURKA and CDKN2A (Fig. 3D). A search in PubMed
revealed that decreased expression of RASA1 is associated
with abnormal expression of TP53 in advanced colorectal
tumours [74]. However, motif results must be carefully
interpreted. This analysis is more suitable for directed net-
works (usually regulatory networks), in which directional-
ity of the interactions are represented.
Second, a clustering analysis was performed to look
for both functional modules and molecular complexes
with biological meaning. BINGO software highlighted that
“DNA-repair” (p=4.110–8) and “response to DNA dam-
age” (p=1.110–7) were the most representative GO terms
in the cluster grouping MLH1-interacting proteins. Also
“transmembrane receptor protein serine/threonine kinase
signalling pathway” (p=8.410–13) and “small GTPase
mediated signal transduction” (p=2.010–11) were the most
representative functions of Smad4-interacting proteins
(Fig. 3E). A betweenness centrality clustering analysis with
GraphWeb software effectively separated CIN and MSI
genes, and was also useful to discover biological pathways
inside the network: MLH1 and its interacting proteins
formed a module with statistically signifi cant enrichment
in the KEGG pathway “mismatch repair” (concordant with
BINGO results). BUB1, CDK1 and TGFBR2 defi ned a
module of interacting proteins enriched in “transforming
growth factor receptor signalling pathway”. The GO term
“Wnt receptor signalling pathway” included APC, CT-
NNB1 and AXIN2 (Fig 3F). So, although WNT1 intrigu-
ingly did not appear to interact with these proteins, this ap-
proach was able to capture the classical Wnt/beta-catenin
pathway in CIN CRC [75]. As an alternative approach,
MCODE was used to search for putative molecular com-
plexes. Four complexes were retrieved: the fi rst included
AURKA, MAD2L1 and its interacting proteins. The sec-
ond contained BRAF, EGFR and its linker proteins RIN1,
PKP2, RAPGEF1 and CRK. TP53, BUB1, HDAC5 and
PRKCA formed another complex. Lastly, a four-node com-
plex included the direct interaction between seed proteins
Clin Transl Oncol (2012) 14:3-14
11
TGFBR2, SMAD4, and its linker proteins SMAD3 and
SMAD7 (Fig. 3G).
Finally, data integration was performed. Easily, using
Cytoscape software, nodes from the PPIN were merged
A
B
D
C
Fig. 3 Example of PPIN construction and analysis. A Visual representation (force directed layout) of the network using Cytoscape software. BI-
ANA software was chosen to construct a PPIN with only experimentally determined interactions, which resulted in 1466 nodes and 2176 edges.
The bottom right insert shows a reduction to MNC of the same PPIN. B Protocol followed to analyse the network: topological exploration,
clustering and data integration. C Centrality measures of the PPIN using Hubba (Degree and MNC) and NetworkAnalyzer (betweenness). Both
applications output a ranking of the proteins but differ in the graphical representation. Hubba uses a colour code to highlight the most centred
proteins in the PPIN (from red to blue). In NetworkAnalyzer the larger nodes represent the most centralised proteins. D Output of MAVisto soft-
ware. On the right, the description of all discovered motifs. On the left, black and white PPIN with network motifs represented in colour.
12
Clin Transl Oncol (2012) 14:3-14
with a list of 202 differentially expressed genes between
cancerous and noncancerous colon tissues, extracted from
Bertucci et al. [76]. As a result, 37 proteins were found to
be deregulated at mRNA level (Fig. 3H). These included
some previously identifi ed as important hubs such us TP53
(over-expressed), reinforcing their critical role in colorectal
tumorigenesis. Among interacting proteins, this approach
allowed us to focus our attention on parts of the network
containing deregulated proteins such us TGFB3 (under-
expressed) or CDK2 (up-regulated) [77]. Moreover, a more
detailed analysis revealed that though crucial CRC proteins
such as APC did not appear differentially expressed (prob-
ably because it is a mutated but not differentially expressed
gene), some of its interacting proteins like PTK2 or SFN
were up-regulated. Specially, YWHAZ emerged as an
important protein at the crossroads of BRAF, EGFR and
TP53.
Cerebral software was used to merge the network with
subcellular location information. A reduced PPIN that only
included seeds and linkers was used to obtain a clearer
picture. Proteins were placed into layers of predefi ned lo-
cations: extracellular region, plasma membrane, cytoplasm,
peroxisome, proteasome complex, mitochondrion, Golgi,
endoplasmic reticulum and nucleus. MLH1, located in the
nucleus, mainly interacted with proteins in the nucleus,
with the exception of TRIM29 and AP2B1, which are cy-
Fig. 3 (continuation) E Functional analysis using BINGO plugin for Cytoscape. In orange, nodes selected to analyze (in this example MLH1 and
SMAD4 connected proteins). Companion table shows the output including for each GO their ID, a description, a p-value, a corrected p-value
(Benjamini and Hochberg multiple testing correction), the cluster frequency, the total frequency and the genes included in that GO process. F
Division of the network into functional modules (GO terms and KEGG pathways) using GraphWeb. G Output of MCODE tool that looks for
molecular complexes. H Gene expression integration using Cytoscape. Over-expression represented in red and under-expression in green. Light
purple nodes show non-differential expression. On the right, the list of deregulated proteins. I Subcellular location classi cation of the MNC
PPIN using Cerebral plugin. MLH1 and APC are highlighted and individually represented
Clin Transl Oncol (2012) 14:3-14
13
toplasmatic proteins. APC interacted with both nuclear and
cytoplasmatic proteins. This is probably due to the reported
nuclear-cytoplasmic shuttling of APC [78] (Fig. 3I).
Note of caution
An increasing number of specialised tools are appearing in
each area of network construction and analysis. We strong-
ly encourage researchers to search and explore other tools
beyond those described here.
It is also important not to forget that a network is just a
representation of the studied system, but not the real world.
Although valuable for hypothesis generation, biological
validation of the hypothesis derived from network analysis
is desirable. It is necessary to keep in mind that despite
huge efforts made in this area, the human interactome is not
completed. Well studied proteins have a higher probability
of being included in such a network, resulting in some se-
lection bias with respect to less studied proteins. Moreover,
it is well known that the human interactome contains false
positive interactions, so a careful interpretation of results
is required [79]. Lack of spatial-temporal information is
another obstacle to consider in the network elucidation
process: we assume that two proteins are always interacting
when actually they only work together in a certain tissue or
even organelle, or in a certain cell cycle time [80].
Otherwise, a network-centric approach remains incom-
plete because of the intrinsic complexity of cancer disease:
complex cross-talk among cancer cells [81] and with the
surrounding microenvironment [82] is not painted in PPIN,
which only represent interactions inside a single cancer
cell.
Conclusions
In the fi eld of cancer research, the combination of classical
techniques with systems biology and network tools can be
useful to generate more accurate biological hypotheses re-
garding therapy, prognosis or tumour classifi cation, bring-
ing us closer to personalised medicine.
However, despite the invaluable help of these tech-
niques, no software yet exists comparable to human brain.
A medical and biological point of view is needed for the
interpretation of complex networks.
Confl ict of interest The authors declare that they have no confl ict of
interest relating to the publication of this manuscript.
Acknowledgements This study was supported by the Catalan Insti-
tute of Oncology, the Private Foundation of the Biomedical Research
Institute of Bellvitge (IDIBELL), the Instituto de Salud Carlos III
(grants FIS PI08/1635, FIS PI08/1359, FIS 06/0545 and FIS 05/1006,
PI081359, PI08-1635, PI09-01037), CIBERESP CB07/02/2005, the
Spanish Association Against Cancer (AECC) Scientifi c Foundation,
the Catalan Government DURSI grant 2009SGR1489, and the Euro-
pean Commission grants FOOD-CT-2006-036224-HIWATE and FP7-
COOP-Health-2007-B HiPerDART.
References
1. Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lan-
kelma J (2006) Cancer: a Systems Biology dis-
ease. Biosystems 83:81–90
2. Kitano H (2002) Systems biology: a brief over-
view. Science 295:1662–1664
3. Kreeger PK, Lauffenburger DA (2010) Cancer
systems biology: a network modeling perspective.
Carcinogenesis 31:2–8
4. Wang E, Lenferink A, O’Connor-McCourt M
(2007) Cancer systems biology: exploring cancer-
associated genes on cellular networks. Cell Mol
Life Sci 64:1752–1762
5. Auffray C, Chen Z, Hood L (2009) Systems medi-
cine: the future of medical genomics and health-
care. Genome Med 1:2
6. Clermont G, Auffray C, Moreau Y et al (2009)
Bridging the gap between systems biology and
medicine. Genome Med 1:88
7. Alberghina L, Höfer T, Vanoni M (2009) Mo-
lecular networks and system-level properties. J
Biotechnol 144:224–233
8. Stelzl U, Worm U, Lalowski M et al (2005) A hu-
man protein–protein interaction network: a resource
for annotating the proteome. Cell 122:957–968
9. Ramani AK, Bunescu RC, Mooney RJ, Marcotte
EM (2005) Consolidating the set of known human
protein–protein interactions in preparation for
large-scale mapping of the human interactome.
Genome Biol 6:R40
10. Kann MG (2007) Protein interactions and disease:
computational approaches to uncover the etiology
of diseases. Brief Bioinform 8:333–346
11. Baudot A, Gómez-López G, Valencia A (2009)
Translational disease interpretation with molecu-
lar networks. Genome Biol 10:221. Review
12. Wu Z, Zhao X, Chen L (2009) Identifying respon-
sive functional modules from protein–protein in-
teraction network. Mol Cells 27:271–277. Review
13. Taylor IW, Linding R, Warde-Farley D et al
(2009) Dynamic modularity in protein interac-
tion networks predicts breast cancer outcome. Nat
Biotechnol 27:199–204
14. Wang YC, Chen BS (2011) A network-based bio-
marker approach for molecular investigation and
diagnosis of lung cancer. BMC Med Genom 4:2
15. Jonson PF, Bates PA (2006) Global topological
features of cancer proteins in the human interac-
tome. Bioinformatics 22:2291–2297
16. Xu J, Li Y (2006) Discovering disease-genes by
topological features in human protein–protein
networks. Bioinformatics 22:2800–2805
17. Sanz-Pamplona R, Aragüés R, Driouch K et al
(2011) Expression of endoplasmic reticulum stress
proteins is a candidate marker of brain metastasis
in both ErbB-2(+) and ErbB-2(–) primary breast
tumors. Am J Pathol 179:564–579
18. Pujana MA, Han JD, Starita LM et al (2007)
Network modeling links breast cancer suscep-
tibility and centrosome dysfunction. Nat Genet
39:1338–1349
19. Junker BH, Schreiber F (2007) Analysis of bio-
logical networks. Chapter 3: Graph theory. John
Wiley & Sons, Hoboken, NJ, USA
20. Keshava Prasad TS, Goel R, Kandasamy K et al
(2009) Human Protein Reference Database: 2009
update. Nucleic Acids Res 37:D767–772
21. von Mering C, Huynen M, Jaeggi D et al (2003)
STRING: a database of predicted functional as-
sociations between proteins. Nucleic Acids Res
31:258–261
22. Xenarios I, Salwínski L, Duan XJ et al (2002)
DIP, the Database of Interacting Proteins: a re-
search tool for studying cellular networks of pro-
tein interactions. Nucleic Acids Res 30:303–305
23. Lehne B, Schlitt T (2009) Protein–protein interac-
tion databases: keeping up with growing interac-
tomes. Hum Genomics 3:291–297
24. Berggård T, Linse S, James P (2007) Methods for
the detection and analysis of protein protein inter-
actions. Proteomics 7:2833–2842. Review
25. Shoemaker BA, Panchenko AR (2007) Decipher-
ing protein–protein interactions. Part II. Compu-
tational methods to predict protein and domain
interaction partners. PLoS Comput Biol 3:e43.
Review
26. Kolaczyk E (2009) Mapping networks. In: Statis-
tical analysis of network data. Springer
27. Huang S (2004) Back to the biology in systems
biology: what can we learn from biomolecular
networks? Brief Funct Genomic Proteomic 2:279–
297
28. Suderman M, Hallett M (2007) Tools for visually
exploring biological networks. Bioinformatics
23:2651–2659. Review
29. Dorogovtsev SN, Mendes JF, Samukhin AN
(2001) Size-dependent degree distribution of a
scale- free growing network. Phys Rev E Stat
Nonlin Soft Matter Phys 63:062101
30. Oti M, Snel B, Huynen MA, Brunner HG (2006)
Predicting disease genes using protein–protein
interactions. J Med Genet 43:691
31. Garcia-Garcia J, Guney E, Aragues R et al (2010)
Biana: a software framework for compiling bio-
logical interactions and analyzing networks. BMC
Bioinform 11:56
32. Lee SA, Chan CH, Chen TC et al (2009) POINeT:
protein interactome with sub-network analysis
and hub prioritization. BMC Bioinform 10:114
33. Minguez P, Götz S, Montaner D et al (2009)
14
Clin Transl Oncol (2012) 14:3-14
SNOW, a web-based tool for the statistical analy-
sis of protein–protein interaction networks. Nucle-
ic Acids Res 37:W109–114
34. Chaurasia G, Iqbal Y, Hänig C et al (2007) UniHI:
an entry gate to the human protein interactome.
Nucleic Acids Res 35:D590–594
35. Pa vlopoulos GA, Wegener AL, Schneider R
(2008) A survey of visualization tools for biologi-
cal network analysis. BioData Min 1:12
36. Shannon P, Markiel A, Ozier O et al (2003) Cy-
toscape: a software environment for integrated
models of biomolecular interaction networks.
Genome Res 13:2498–2504
37. Killcoyne S, Carter GW, Smith J, Boyle J (2009)
Cytoscape: a community-based framework for net-
work modeling. Methods Mol Biol 563:219–239
38. Pavlopoulos GA, O’Donoghue SI, Satagopam VP
et al (2008) Arena3D: visualization of biological
networks in 3D. BMC Syst Biol 2:104
39. Hooper SD, Bork P (2005) Medusa: a simple tool
for interaction graph analysis. Bioinformatics
21:4432–4433
40. Assenov Y, Ramírez F, Schelhorn SE et al (2008)
Computing topological parameters of biological
networks. Bioinformatics 24:282–284
41. Barabasi AL, Albert R (1999) Emergence of scal-
ing in random networks. Science 286:509–512
42. Barabási AL, Bonabeau E (2003) Scale-free net-
works. Sci Am 288:60–69
43. Goh KI, Cuskick ME, Valle D et al (2007) The
human disease network. Proc Natl Acad Sci U S
A 104:8685–8690
44. He X, Zhang J (2006) Why do hubs tend to be es-
sential in protein networks? PLoS Genet 2:e88
45. Kar G, Gursoy A, Keskin O (2009) Human cancer
protein–protein interaction network: a structural
perspective. PLoS Comput Biol 5:e1000601
46. Chen J, Aronow BJ, Jegga AG (2009) Disease can-
didate gene identifi cation and prioritization using
protein interaction networks. BMC Bioinform 10:73
47. Lin CY, Chin CH, Wu HH et al (2008) Hubba:
hub objects analyzer–a framework of interactome
hubs identifi cation for network biology. Nucleic
Acids Res 36:W438–443
48. Junker BH, Koschützki D, Schreiber F (2006) Ex-
ploration of biological network centralities with
CentiBiN. BMC Bioinform 7:219
49. Junker BH, Schreiber F (2007) Network centrali-
ties. In: Analysis of biological networks. John Wi-
ley & Sons, Hoboken, NJ, USA
50. Milo R, Shen-Orr S, Itzkovitz S et al (2002) Net-
work motifs: simple building blocks of complex
networks. Science 298:824–827
51. Moon HS, Bhak J, Lee KH, Lee D (2005) Ar-
chitecture of basic building blocks in protein and
domain structural interaction networks. Bioinfor-
matics 21:1479–1486
52. Assenov Y, Ramírez F, Schelhorn SE et al (2008)
Computing topological parameters of biological
networks. Bioinformatics 24:282–284
53. Schreiber F, Schwöbbermeyer H (2005) MAVisto:
a tool for the exploration of network motifs. Bio-
informatics 21:3572–3574
54. Wernicke S, Rasche F (2006) FANMOD: a tool
for fast network motif detection. Bioinformatics
22:1152–1153
55. Barabási AL, Oltvai ZN (2004) Network biology:
understanding the cell’s functional organization.
Nat Rev Genet 5:101–113. Review
56. Balasundaram B, Butengo S (2007) Network
clustering. In: Analysis of biological networks.
John Wiley & Sons, Hoboken, NJ, USA
57. Luo F, Yang Y, Chen C-F et al (2007) Modular
organization of protein interaction networks. Bio-
informatics 23:207–214
58. Hartwell LH, Hopfi eld JJ, Leibler S, Murray AW
(1999) From molecular to modular cell biology.
Nature 402[6761 Suppl]:C47–52
59. Reimand J, Tooming L, Peterson H et al (2008)
GraphWeb: mining heterogeneous biological net-
works for gene modules with functional signifi -
cance. Nucleic Acids Res 36:W452–459
60. Vlasblom J, Wu S, Pu S et al (2006) GenePro: a
Cytoscape plug-in for advanced visualization and
analysis of interaction networks. Bioinformatics
22:2178–2179
61. Bader GD, Hogue CW (2003) An automated
method for fi nding molecular complexes in large
protein interaction networks. BMC Bioinform 4:2
62. Maere S, Heymans K, Kuiper M (2005) BiNGO:
a Cytoscape plugin to assess overrepresentation of
gene ontology categories in biological networks.
Bioinformatics 21:3448–3449
63. Brohée S, Faust K, Lima-Mendez G et al (2008)
NeAT: a toolbox for the analysis of biological
networks, clusters, classes and pathways. Nucleic
Acids Res 36:W444–451
64. Rivera CG, Vakil R, Bader JS (2010) NeMo: Net-
work Module identifi cation in Cytoscape. BMC
Bioinform 11[Suppl 1]:S61
65. Ma’ayan A (2008) Network integration and graph
analysis in mammalian molecular systems biol-
ogy. IET Syst Biol 2:206–221. Review
66. Liu ET (2005) Systems biology, integrative biolo-
gy, predictive biology. Cell 121:505–506. Review
67. McDermott JE, Costa M, Janszen D et al (2010)
Separating the drivers from the driven: integrative
network and pathway approaches aid identifi ca-
tion of disease biomarkers from high-throughput
data. Dis Markers 28:253–266. Review
68. Mathew JP, Taylor BS, Bader GD et al (2007)
From bytes to bedside: data integration and com-
putational biology for translational cancer re-
search. PLoS Comput Biol 3:e12
69. Camargo A, Azuaje F (2007) Linking gene ex-
pression and functional network data in human
heart failure. PLoS One 2:e1347
70. Barsky A, Gardy JL, Hancock RE, Munzner T
(2007) Cerebral: a Cytoscape plugin for layout
of and interaction with biological networks using
subcellular localization annotation. Bioinformat-
ics 23:1040–1042
71. Paquette J, Tokuyasu T (2010) EGAN: explor-
atory gene association networks. Bioinformatics
26:285–286
72. Junker BH, Klukas C, Schreiber F (2006) VANT-
ED: a system for advanced data analysis and vi-
sualization in the context of biological networks.
BMC Bioinform 7:109
73. Markowitz SD, Bertagnolli MM (2009) Molecu-
lar origins of cancer: molecular basis of colorectal
cancer. N Engl J Med 361:2449–2460. Review
74. Ohta M, Seto M, Ijichi H et al (2009) Decreased
expression of the RAS-GTPase activating protein
RASAL1 is associated with colorectal tumor pro-
gression. Gastroenterology 136:206–216
75. Moon RT (2005) Wnt/beta-catenin pathway. Sci
STKE 2005:cm1. Review
76. Bertucci F, Salas S, Eysteries S et al (2004) Gene
expression profiling of colon cancer by DNA
microarrays and correlation with histoclinical
parameters. Oncogene 23:1377–1391
77. Minguez P, Dopazo J (2011) Assessing the bio-
logical signifi cance of gene expression signatures
and co-expression modules by studying their net-
work properties. PLoS One 6:e17474
78. Henderson BR (2000) Nuclear-cytoplasmic shut-
tling of APC regulates beta-catenin subcellular lo-
calization and turnover. Nat Cell Biol 2:653–660
79. Chua HN, Wong L (2008) Increasing the reliabil-
ity of protein interactomes. Drug Discov Today
13:652–658
80. Strogatz SH (2001) Exploring complex networks.
Nature 410:268–276. Review
81. Hanahan D, Weinberg RA (2011) Hallmarks of
cancer: the next generation. Cell 144:646–674
82. Kenny PA, Lee GY, Bissell MJ (2007) Target-
ing the tumor microenvironment. Front Biosci
12:3468–3474
... Visualization tools are evaluated by four criteria: compatibility (available on which OS (operating systems): Windows, Mac Os, and Linux, analytic functions (presence of functions measuring the topological properties of the network, weak interactions of external data, etc.), visualizations (graph layout, dynamics, and parallel implementation), and the extensibility of the tool (addition of plugins, type of input, and output file) forming distinct classes (Sanz-Pamplona et al., 2012;Agapito, Guzzi and Cannataro, 2013;Dallago et al., 2020). In the context of biological network analysis and in particular protein networks, one of the essential criteria is dynamic visualization tools (Xia, Benner and Hancock, 2014;Zhou and Xia, 2018). ...
... Li et al., 2018b). In order to overcome the limitation of network size and consider the dynamics of the networks, several tools have been developed over the last decades (Sanz-Pamplona et al., 2012;Winkler et al., 2021). The success of Cytoscape is due to the large number of plugins/features that can be added directly from the tool (Saito et al., 2012;Lotia et al., 2013). ...
Article
Full-text available
At the heart of the cellular machinery through the regulation of cellular functions, protein–protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
... In a study on prostate cancer, Closeness and Betweenness were found to be more predictive of unknown genes/proteins related to the disease, whereas Degree was able to explore known related genes/proteins accurately 25 . Centrality indices of Degree, Betweenness, and Closeness have been proposed to be profitable in exploring essential proteins of cancer PPI networks 45 and have been applied in studying the pan-cancer network of proteins related to epithelial-mesenchymal transition 46 . However, we reasoned that Degree centrality could give high scores to more known proteins due to the simple fact that these more recognized proteins have been the focus of more studies, and therefore more interactions (network neighbors) have been found for them. ...
... Therefore, one must find or suggest the proper indices in the study based on the context of the biological problem in the study 52 . Moreover, PPI networks are affected by false positives and, have also not yet been completed 45 , which is a source of inaccuracy in these analyses. Therefore, a more cautious approach to network analysis results is recommended. ...
Article
Full-text available
Anaplastic thyroid carcinoma (ATC) is the most rare and lethal form of thyroid cancer and requires effective treatment. Efforts have been made to restore sodium-iodide symporter (NIS) expression in ATC cells where it has been downregulated, yet without complete success. Systems biology approaches have been used to simplify complex biological networks. Here, we attempt to find more suitable targets in order to restore NIS expression in ATC cells. We have built a simplified protein interaction network including transcription factors and proteins involved in MAPK, TGFβ/SMAD, PI3K/AKT, and TSHR signaling pathways which regulate NIS expression, alongside proteins interacting with them. The network was analyzed, and proteins were ranked based on several centrality indices. Our results suggest that the protein interaction network of NIS expression regulation is modular, and distance-based and information-flow-based centrality indices may be better predictors of important proteins in such networks. We propose that the high-ranked proteins found in our analysis are expected to be more promising targets in attempts to restore NIS expression in ATC cells.
... Biological networks are scale-free. Scale-free means that few nodes have a high degree ("balls"), and most nodes have a low degree and are only related to one or a few neighbors [21]. In network analysis, based on two cutoff criteria, the topologically important proteins were selected. ...
Article
Full-text available
Purpose ITP is the most prevalent autoimmune blood disorder. The lack of predictive biomarkers for therapeutic response is a major challenge for physicians caring of chronic ITP patients. This study is aimed at identifying predictive biomarkers for drug therapy responses. Methods 2D gel electrophoresis (2-DE) was performed to find differentially expressed proteins. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometer (MALDI-TOF MS) analysis was performed to identify protein spots. The Cytoscape software was employed to visualize and analyze the protein-protein interaction (PPI) network. Then, enzyme-linked immunosorbent assays (ELISA) were used to confirm the results of the proteins detected in the blood. The DAVID online software was used to explore the Gene Ontology and pathways involved in the disease. Results Three proteins, including APOA1, GC, and TF, were identified as hub-bottlenecks and confirmed by ELISA. Enrichment analysis results showed the importance of several biological processes and pathway, such as the PPAR signaling pathway, complement and coagulation cascades, platelet activation, vitamin digestion and absorption, fat digestion and absorption, cell adhesion molecule binding, and receptor binding. Conclusion and Clinical Relevance. Our results indicate that plasma proteins (APOA1, GC, and TF) can be suitable biomarkers for the prognosis of the response to drug therapy in ITP patients.
... The PPI data in this review were retrieved from STRING and GeneMANIA, as both databases contain a large number of PPI datasets, including experimental and predicted interactions. Integrative analysis by combining data from different databases is essential to obtain a comprehensive PPI network and a complete biological system model [60]. The more data from various sources that are integrated, the more informative the PPI network is. ...
Article
Full-text available
Protein–protein interaction (PPI) is involved in every biological process that occurs within an organism. The understanding of PPI is essential for deciphering the cellular behaviours in a particular organism. The experimental data from PPI methods have been used in constructing the PPI network. PPI network has been widely applied in biomedical research to understand the pathobiology of human diseases. It has also been used to understand the plant physiology that relates to crop improvement. However, the application of the PPI network in aquaculture is limited as compared to humans and plants. This review aims to demonstrate the workflow and step-by-step instructions for constructing a PPI network using bioinformatics tools and PPI databases that can help to predict potential interaction between proteins. We used zebrafish proteins, the oestrogen receptors (ERs) to build and analyse the PPI network. Thus, serving as a guide for future steps in exploring potential mechanisms on the organismal physiology of interest that ultimately benefit aquaculture research.
... It is prudent to note that centrality measures in PPI networks must be interpreted with caution due to publication bias that can be an inherent part of the network [61,62]. The top network genes identified from the PPI network are likely to be heavily influenced by publication bias [63]. ...
Article
Full-text available
Background: Cellular senescence, a permanent state of replicative arrest in otherwise proliferating cells, is a hallmark of aging and has been linked to aging-related diseases. Many genes play a role in cellular senescence, yet a comprehensive understanding of its pathways is still lacking. Results: We develop CellAge (http://genomics.senescence.info/cells), a manually curated database of 279 human genes driving cellular senescence, and perform various integrative analyses. Genes inducing cellular senescence tend to be overexpressed with age in human tissues and are significantly overrepresented in anti-longevity and tumor-suppressor genes, while genes inhibiting cellular senescence overlap with pro-longevity and oncogenes. Furthermore, cellular senescence genes are strongly conserved in mammals but not in invertebrates. We also build cellular senescence protein-protein interaction and co-expression networks. Clusters in the networks are enriched for cell cycle and immunological processes. Network topological parameters also reveal novel potential cellular senescence regulators. Using siRNAs, we observe that all 26 candidates tested induce at least one marker of senescence with 13 genes (C9orf40, CDC25A, CDCA4, CKAP2, GTF3C4, HAUS4, IMMT, MCM7, MTHFD2, MYBL2, NEK2, NIPA2, and TCEB3) decreasing cell number, activating p16/p21, and undergoing morphological changes that resemble cellular senescence. Conclusions: Overall, our work provides a benchmark resource for researchers to study cellular senescence, and our systems biology analyses reveal new insights and gene regulators of cellular senescence.
Article
Full-text available
This systematic review aims to provide a comprehensive overview of graph-based methodologies utilized in the analysis of protein–protein interaction (PPI) networks. The primary objective is to synthesize existing literature and identify key methodologies, resources, and best practices in the field, with a focus on their application in uncovering essential cancer proteins. A systematic literature search was conducted across various databases to identify relevant studies focusing on graph-based explorations of PPI networks. The selected articles were critically reviewed, and data were extracted regarding the methodologies employed, resources utilized, and best practices identified. The review proceeds to outline a workflow that illustrates the systematic process from the compilation of gene/protein datasets to the generation of essential cancer proteins. A case study on “uncovering essential cancer proteins in breast cancer” was included to exemplify the application of graph-based methodologies in a real-world scenario. The review revealed various graph-based methodologies utilized in PPI network analysis, including centrality measures, pathway enrichment analyses, and network visualization techniques. Essential resources such as databases, software tools, and repositories were identified, along with best practices for data preprocessing, network construction, and analysis. The synthesis of findings, complemented by the case study, provides researchers with a comprehensive understanding of the current landscape of graph-based PPI network analysis and its application in cancer research. This systematic review contributes to the field by offering a holistic overview of graph-based explorations in PPI network research, with a specific focus on cancer protein identification. By synthesizing existing knowledge and identifying essential resources and best practices, this review serves as a valuable resource for researchers, facilitating informed decision-making and enhancing research quality and reproducibility. The inclusion of the case study underscores the practical application of graph-based methodologies in uncovering essential cancer proteins.
Article
Background: Epstein-Barr virus (EBV) affects more than 90% of global population. The role of the virus in causing infectious mononucleosis (IM) affecting B-cells and epithelial cells and in the development of EBV associated cancers is well documented. Investigating the associated interactions can pave way for the discovery of novel therapeutic targets for EBV associated lymphoproliferative (Burkitt's Lymphoma and Hodgkin's Lymphoma) and non-lymphoproliferative diseases (Gastric cancer and Nasopharyngeal cancer). Methods: Based on the DisGeNET (v7.0) data set, we constructed a disease-gene network to identify genes that are involved in various carcinomas, viz. Gastric cancer (GC), Nasopharyngeal cancer (NPC), Hodgkin's lymphoma (HL) and Burkitt's lymphoma (BL). We identified communities in the disease-gene network and performed functional enrichment using over-representation analysis to detect significant biological processes/pathways and the interactions between them. Result: We identified the modular communities to explore the relation of this common causative pathogen (EBV) with different carcinomas such as GC, NPC, HL and BL. Through network analysis we identified the top 10 genes linked with EBV associated carcinomas as CASP10, BRAF, NFKBIA, IFNA2, GSTP1, CSF3, GATA3, UBR5, AXIN2 and POLE. Further, the tyrosine-protein kinase (ABL1) gene was significantly over-represented in 3 out of 9 critical biological processes, viz. in regulatory pathways in cancer, the TP53 network and the Imatinib and chronic myeloid leukemia biological processes. Consequently, the EBV pathogen appears to target critical pathways involved in cellular growth arrest/apoptosis. We make our case for BCR-ABL1 tyrosine-kinase inhibitors (TKI) for further clinical investigations in the inhibition of BCR-mediated EBV activation in carcinomas for better prognostic and therapeutic outcomes.
Article
All life on Earth is related, so that some molecular interactions are common across almost all living cells, with the number of common interactions increasing as we look at more closely related species. In particular, we expect the protein–protein interaction (PPI) networks of closely related species to share high levels of similarity. This similarity may facilitate the transfer of functional knowledge between model species and human. Multiple network alignment is the process of uncovering the connection similarity between three or more networks simultaneously. Existing algorithms for multiple network alignment rely on sequence similarities to help drive the alignments, and no comprehensive study has been done to determine the most effective ways to utilize network connectivity—network topology—to drive multiple network alignment. Here, we devise and empirically test the efficacy of several measures of topological similarity between three or more networks. To evolve the alignments toward optimal, we use simulated annealing as the search algorithm since it is agnostic to the objective being optimized. We test the measures both on the partially synthetic and highly similar PPI networks from the integrated interaction database, as well as on real PPI networks from a recent BioGRID release.
Article
Melatonin helps maintain circadian rhythm, exerts anticancer activity, and plays key roles in regulation of glucose homeostasis and energy metabolism. Glycosylation, a form of metabolic flux from glucose or other monosaccharides, is a common post‐translational modification. Dysregulated glycosylation, particularly O‐GlcNAcylation, is often a biomarker of cancer cells. In this study, elevated O‐GlcNAc level in bladder cancer was inhibited by melatonin treatment. Melatonin treatment inhibited proliferation and migration and enhanced apoptosis of bladder cancer cells. Proteomic analysis revealed reduction of cyclin‐dependent‐like kinase 5 (CDK5) expression by melatonin. O‐GlcNAc modification determined the conformation of critical T‐loop domain on CDK5, and further influenced the CDK5 stability. The mechanism whereby melatonin suppressed O‐GlcNAc level was based on decreased glucose uptake and metabolic flux from glucose to UDP‐GlcNAc, and consequent reduction of CDK5 expression. Melatonin treatment, inhibition of O‐GlcNAcylation by OSMI‐1, or mutation of key O‐GlcNAc site strongly suppressed in vivo tumor growth. Our findings indicate that melatonin reduces proliferation and promotes apoptosis of bladder cancer cells by suppressing O‐GlcNAcylation of CDK5.
Article
Label-free optical detection of biomolecules is currently limited by a lack of specificity rather than sensitivity. To exploit the much more characteristic refractive index dispersion in the mid-infrared (IR) regime, we have engineered three-dimensional IR-resonant silicon micropillar arrays (Si-MPAs) for protein sensing. By exploiting the unique hierarchical nano- and microstructured design of these Si-MPAs attained by CMOS-compatible silicon-based microfabrication processes, we achieved an optimized interrogation of surface protein binding. Based on spatially resolved surface functionalization, we demonstrate controlled three-dimensional interfacing of mammalian cells with Si-MPAs. Spatially controlled surface functionalization for site-specific protein immobilization enabled efficient targeting of soluble and membrane proteins into sensing hotspots directly from cells cultured on Si-MPAs. Protein binding to Si-MPA hotspots at submonolayer level was unambiguously detected by conventional Fourier transform IR spectroscopy. The compatibility with cost-effective CMOS-based microfabrication techniques readily allows integration of this novel IR transducer into fully fledged bioanalytical microdevices for selective and sensitive protein sensing.
Chapter
The systematic collection and analysis of data on networks of one form or another goes back at least to the 1930’s in certain select areas of science, and in fact has subtle roots reaching back centuries further. However, during the decade surrounding the turn of the 21st century, network-centric analysis, as a general approach to scientific inquiry, has reached entirely new levels of prevalence and sophistication, with practitioners in fields now ranging from the physical and mathematical sciences to the social sciences and humanities. In this chapter we present a ‘birds-eye’ view of the area that is gradually coming to be known as ‘network science,’ starting with some background, continuing with a mosaic of examples, and finishing with a discussion of the organization and philosophy of this book.
Article
Despite some notable successes cancer remains, for the most part, a seemingly intractable problem. There is, however, a growing appreciation that targeting the tumor epithelium in isolation is not sufficient as there is an intricate mutually sustaining synergy between the tumor epithelial cells and their surrounding stroma. As the details of this dialogue emerge, new therapeutic targets have been proposed. The FDA has already approved drugs targeting microenvironmental components such as VEGF and aromatase and many more agents are in the pipeline. In this article, we describe some of the 'druggable' targets and processes within the tumor microenvironment and review the approaches being taken to disrupt these interactions.
Article
Flows are at the heart of the form and function of many networks, and understanding their behavior is often a goal of primary interest. Here we consider problems of statistical estimation and prediction arising in connection with various types of measurements relating to network flows.