Tools for protein-protein interaction network analysis in cancer research


As cancer is a complex disease, the representation of a malignant cell as a protein-protein interaction network (PPIN) and its subsequent analysis can provide insight into the behaviour of cancer cells and lead to the discovery of new biomarkers. The aim of this review is to help life-science researchers without previous computer programming skills to extract meaningful biological information from such networks, taking advantage of easy-to-use, public bioinformatics tools. It is structured in four parts: the first section describes the pipeline of consecutive steps from network construction to biological hypothesis generation. The second part provides a repository of public, user-friendly tools for network construction, visualisation and analysis. Two different and complementary approaches of network analysis are presented: the topological approach studies the network as a whole by means of structural graph theory, whereas the global approach divides the PPIN into sub-graphs, or modules. In section three, some concepts and tools regarding heterogeneous molecular data integration through a PPIN are described. Finally, the fourth part is an example of how to extract meaningful biological information from a colorectal cancer PPIN using some of the described tools.
Cancer is a complex disease in which many proteins, genes
and molecular processes are implicated [1]. Genes and
proteins do not work independently, but are organised into
co-regulated units that perform a common biological func-
tion. It is the alteration of these functional elements that
leads to the development of a particular cancer phenotype
(i.e., drug response or disease outcome) and, consequently,
their study cannot be tackled from the classical one-gene
approach. A systems biology approach, the analysis of the
molecular relationship between the implicated genes and
proteins as a whole, is required to understand the disease
phenotype [2–4].
In this scenario, cancer systems medicine emerges as a
translational extension of systems biology that meets the
clinical information and the -omics disciplines for the clas-
sifi cation and diagnosis of cancer subtypes, the prognosis
of patient outcomes, the prediction of treatment responses
and the identifi cation of perturbation targets for drug devel-
opment [5, 6].
Proteins interact with each other within a cell, and those
interactions can be represented by a network, defi ned as an
abstract representation of nodes or vertices (i.e., proteins)
R. Sanz-Pamplona · A. Berenguer · X. Sole · D. Cordero ·
M. Crous-Bou · J. Serra-Musach · E. Guinó · M.A. Pujana ·
V. Moreno ()
Unit of Biomarkers and Susceptibility
Catalan Institute of Oncology (ICO)
Bellvitge Institute for Biomedical Research (IDIBELL)
Biomedical Research Centre Network for Epidemiology
and Public Health (CIBERESP)
Av. Gran Vía, 199
ES-08908 L’Hospitalet de Llobregat, Barcelona, Spain
V. Moreno
Department of Clinical Sciences
Faculty of Medicine
University of Barcelona
Barcelona, Spain
Tools for protein-protein interaction network analysis in cancer research
Rebeca Sanz-Pamplona · Antoni Berenguer · Xavier Sole · David Cordero · Marta Crous-Bou · Jordi Serra-Musach ·
Elisabet Guinó · Miguel Ángel Pujana · Víctor Moreno
Received: 14 July 2011 / Accepted: 20 August 2011
where some pairs of nodes are connected by edges repre-
senting interactions [7]. With the recent advances in high-
throughput experimental technologies, increasing numbers
of large-scale biological networks are being defi ned [8,
9]. Network knowledge can give rise to understanding
the biological function and dynamic behaviour of cellular
systems, generating biological hypothesis about putative
biomarkers, therapeutic targets or deregulated pathways in
cancer [10–14].
Cancer-related proteins have a higher ratio of promis-
cuous structural domains, making them more prone to
interact with other proteins. In fact, they have a large num-
ber of interacting proteins and occupy a central position in
the networks [15]. Proteins interacting with cancer-related
proteins have a higher probability of being related with the
cancer process than non-interacting proteins. Hence, the
study of those proteins may be an effi cient way to discover
novel cancer genes and cancer biomarkers [16–18].
Since understanding complex networks representing a
cancer cell is one of the main challenges of today’s biol-
ogy, this review attempts to help life-science researchers
without previous computer programming skills to extract
meaningful biological information from such networks,
taking advantage of easy-to-use, public bioinformatics
tools. Though different types of biological networks exists
such us regulatory networks, signal transduction networks
or metabolic networks, here only protein–protein interac-
tion networks (PPINs) will be covered. In addition, due
to the complexity of directed (networks whose edges have
directional information) and dynamic networks (those
including changes along time), only undirected and static
PPINs will be discussed here [19].
This review is structured into four sections. The fi rst
section briefl y describes the workfl ow enumerating con-
secutive steps from network construction to hypothesis
generation. The second section details suitable tools to
carry out each step of the analysis. Some concepts about
data integration are described in the third section. Finally,
a fourth section presents an illustrative example of how to
use these tools, using colorectal cancer data. This is not a
compendium of all existing network-management tools but
a tutorial to construct and analyze protein interaction net-
works in a simple manner. It should be noted, however, that
a unique method does not exist and each particular network
may have characteristics requiring specifi c software. For
example, software suitable for dealing with huge graphs
may not be helpful for analysing small networks, and vice
versa. Also, although graph theory is beyond the aim of
this review, basic ideas to start dealing with interaction net-
works will be provided and the references would be helpful
for a more in-depth study of this topic.
Work-fl ow: from network assembly to hypothesis
Figure 1 summarises five sequential steps required to
generate a biological hypothesis on cancer cell behaviour
through PPIN construction and analysis.
The starting point is to decide which proteins defi ne
the input, hereafter seed proteins (fi rst step). These should
be the molecules of major interest and will be the skeleton
of the PPIN. Typical choices are differentially expressed
Fig. 1 Pipeline. The process of PPIN con-
struction and analysis follows these con-
secutive steps. First, a list of molecules of
interest (seed proteins) is defined. Next,
their interactions are searched in a specia-
lised database and represented in a PPIN.
Then, the network is analysed and conse-
quently a biological hypothesis is generated.
In this diagram (steps 1–3), six seed pro-
teins (1–6) are represented in red whereas
their interacting proteins are represented in
green (“a” and “b”). Protein “a” interacts
with seeds “1”, “2” and “6”; protein “b”
interacts with seeds “3” and “4”; and seeds
“1” and “6” interact with each other. Seed
“5” has no interacting partners. Step 4 of the
chart shows two complementary network
analysis approaches: rst, the topological
methods, which look for essential nodes
into the architecture of the network. In this
example interacting protein “a” acts as a
hub because of their higher degree. Second,
modular methods divide network into sub-
graphs grouping proteins sharing a common
property. Some of the public tools useful in
each step of the analysis are also represented
in the gure
molecules observed in a given experiment (transcriptomic
or proteomic) or molecules known to be involved in cancer.
The fi nal hypothesis derived from the network analysis will
be directly related to these seed proteins.
The second step is the retrieval of binary interactions.
Interacting partners of seed proteins need to be identifi ed
from curated databases. Several publicly available data-
bases exist: HPRD [20], String [21], DIP [22] and others
[23]. A description of the experimental and computational
procedures to obtain these protein-protein interaction data
is beyond the objectives of this review (see Refs. [24] and
[25] for more information).
The third step is network construction and visu-
alisation. From the set of protein–protein interactions, the
construction of a graph consists of assigning vertices to
proteins and edges to interactions between proteins. Then,
several algorithms allow the creation of a visual represen-
tation of the network [26].
The fourth step is network analysis, when meaningful
biological information extraction is done using bioinfor-
matics methods. Two complementary approaches in the
area of network analysis exist: topological (study of the
whole graph) and modular (division of networks into mod-
ules of related proteins) [27].
As a result, derived from network construction and
analysis, a hypothesis generation (step 5) regarding the
initial data is desirable. Ideally, a topological network anal-
ysis usually identifi es proteins susceptible to be biomarkers
or therapeutic targets, whereas a modular approach gives
information about deregulated functions or pathways.
Public tools for network management
Multiple public network management tools exist. An ex-
haustive review published few years ago identifi ed no less
than 35 and the number continues to grow exponentially
[28]. In this review only some of the best known and/or
easier to use will be discussed, but it is strongly recom-
mended to explore other tools, some of which might be
useful for specifi c topics.
Table 1 Construction tools
Kind of interactions Included databases Input Distinctive features Webpage
determined and
determined and
Only experimentally
determined and
String, Intact,
DIP, Degg, IPI,
SCOP, UniProt,
Reactome, MINT,
cog and psi_mi
BioGRID and
HPRD, IntAct,
IntAct, BioGRID,
BIANA [31]
Poinet [32]
SNOW [33]
UniHI [34]
Accepts a variety
of identifi ers (Uni-
Prot, Ensembl,
NCBI or UniProt
identifi ers
Gen, transcript or
Entrez Gene,
UniProt, NCBI,
Ensembl, RefSeq,
BioGrid, HPRD,
On-line interface
Flexibility: It is possible to choose the
network level, the relation types, restrict
interactions by method and add interologs
The output could be downloading and
visualised in Cytoscape
By default, only experimentally
determined interactions were retrieved
It is possible to fi lter interactions based
on the number of shared GO terms
between the two interacting proteins
PPI could be fi ltered with tissue-specifi c
expression data from public resources
The output will be directly visualised
with POINET or be downloaded and
visualised in Cytoscape
Construct minimal connected network
(MCN); a graph containing only seed
and linker proteins
Maps seed proteins onto an interactome
of reference calculating the network
parameters degree, clustering coeffi cient
and betweenness
Uses Human Gene Atlas data to construct
tissue-specifi c interaction networks
Annotate networks with pathway
information from KEGG database
Only accept a maximum of 50 proteins
as an input
Network management software typically specialises in
construction, visualisation or analysis steps. However, this
is an artifi cial classifi cation and overlapping is common:
some tools are useful for several or all steps.
Construction tools
Once the list of seed proteins is ready, the fi rst step con-
sists in joining them together through linkers, proteins that
bind with two or more seed proteins working as bridges.
The number of linker proteins inserted between two seed
proteins determines the network distance or network level
[29]. A distance one is recommended since distance two
usually retrieves an undesirable “ball of yarn” network.
Moreover, at this point of the analysis it is crucial to de-
cide the nature of the interactions that will be included in
the analysis: experimentally and/or computationally de-
termined. Literature-based interactions are more reliable
but biased towards networks of the better studied proteins
and less likely to discover new interesting interactions.
Computational-inferred interactions from high-throughput
experiments do not have this bias, but result in a higher
rate of false interactions being included, so a more careful
interpretation of the data and a subsequent experimental
validation are desirable [30].
Tools exist that look for binary interactions in special-
ised databases and automatically retrieve a PPIN. Some of
the more popular ones are summarised in Table 1.
Visualisation tools
Given a complex system under study, one natural goal is
to create a graphical representation of the system as a net-
work in which nodes represent proteins and edges interac-
tions between proteins [7]. Creating this representation is
not trivial work and sometimes it drives the interpretation
of the system and the hypothesis derived. Diverse layouts
exist to represent a network such as circular, hierarchical or
force-directed [28]. These can be drawn with network visu-
alisation tools (see Ref. [35] for a review). Three of them
have been summarised in Table 2.
Analytical tools
In order to extract underlying biological information from
the PPIN, it is necessary to analyse it using graph-theo-
retic tools. Two different and complementary approaches,
named topological and modular, have been developed for
the study of a complex network. The topological approach
studies the network as a whole by means of the analysis of
the structural parameters of the graph. Instead, the modular
approach divides the PPIN into modules that group nodes
based on a common characteristic such as sharing the same
function or belonging to the same pathway. Afterwards,
each module is studied separately [27].
Topological approaches: centrality measures and network
The description of the structural characteristics of a net-
work is often the first step in the analysis of network
data [40]. Biological networks including PPIN are usually
scale-free, meaning that a few nodes are highly connected
(“hubs”) and a majority of nodes are linked to only one or
a few neighbours [41, 42]. According to the lethality and
centrality rule, nodes that have a major number of connec-
tions are those that play a more important role in the archi-
tecture of the PPIN and tend to be biologically relevant in
the studied system [43]. In other words, highly connected
proteins are essential to organism viability [44]. It has also
been demonstrated that genes traditionally associated with
cancer are implicated in multiple cellular processes and
Table 2 Visualisation tools
Usage Input Distinctive features Webpage
Ease-to-download and
install Java applica-
tion (Windows, Mac or
The software can be
downloaded or directly
run from the web page
Java application
Table of interactions
(.xls or .txt) Multiple fi le
types (.xml, .rdf, .owl,
.gml, .xgmml, .sif, .sbml)
List of interactions in .txt
List of interactions
retrieved from STRING
Cytoscape [36, 37]
Arena3D [38]
The most popular visualisation tool
Allows a variety of graph customisation
Useful to integrate biomolecular networks
into a unifi ed framework
Cytoscape functionality can be expanded
using the collection of plugins developed
by Cytoscape’s community of users
3D view of the network
Is recommendable to use a graphic card
with hardware-accelerated 3D graphics and
at least 256 MB of graphical memory
It was specially designed and optimised for
accessing protein interaction data
from STRING database
signalling pathways, so they often work as protein hubs
inside an interaction network [45].
Identifying essential hubs in the PPIN is a way to
decipher the critical players inside the complex network.
Network centrality measures can be used to rank the nodes
of a given network and find the most important nodes,
hypothetically useful as biomarkers or therapeutic targets
[46]. The identification of central elements in biologi-
cal networks may also provide new hypotheses that lead
to more rational approaches in experimental design [47].
Several centrality measures exist that should be considered
within an exploratory process. The most important ones are
degree, betweenness, closeness and eigenvector centrality.
See Refs. [48] and [49] for a more in-depth explanation of
these concepts.
Network motif distribution is another useful measure. A
motif is a basic building block of complex graphs defi ned
as a sub-network or connectivity pattern that appears in
Table 3 Topological analysis tools
Computed parameters Input Distinctive features Webpage
Degree, bottleneck, edge
percolated component,
subgraph centrality,
maximum neighbourhood
component and density of
maximum neighbourhood
Degree, eccentric-
ity, closeness, radiality,
centroid value, stress,
S.P. Betweenness,
C.-F. Closeness, C.-F.
Betweenness, Katz
Status, Eigenvector, Hub-
bell index, Bargaining,
PageRank, HITS-Hubs,
HITS-Authorities and
Number of nodes and
edges, self-loops,
connected components,
average number of neigh-
bours, network diameter,
radius, density, cen-
tralisation, heterogeneity,
clustering coeffi cient,
number of shortest paths
and the characteristic
path length
List of interactions in .txt
Network data in .net,
.tab, .mat or .xml format
Network charged in
Cytoscape environment
List of interactions in .txt
Hubba [47]
Centibin [48]
MAVisto [53]
Web-based tool
The appropriate tool to just rank proteins
in a network by centrality measures
Free installable Windows application
Useful for a detailed centrality study be-
cause offers more algorithms than
the other tools
Java plugin for Cytoscape
Displays a comprehensive set of
topological parameters
It is possible to visualise different param-
eters in the same network by changing
node’s features (i.e., “degree” in colour and
“closeness centrality” in size)
Motifs were detected by comparing the
frequency of all occurrences of a motif in
the studied network to the frequency values
of this motif in randomisations of the same
MAVisto presents several presentations of
their results: a motif table (with p-value and
z-score), a motif view, a motif fi ngerprint
and a visualisation of motif matches in the
Computationally time consuming
Motifs were detected and grouped intomotif
classes. Then, an algorithm determines
which motif classes are displayed at much
higher frequency than in random graphs
Faster than MAVisto
a PPIN at a signifi cantly higher frequency than would be
expected for a random network [50]. The distribution of
motifs characterises the local structure of networks and has
also been shown to be functionally relevant [51]. Despite
the high complexity involved in the detection of network
motifs, in practice the search can be executed in reasonable
time using available software. Typical motifs that repeat-
edly appear in regulatory networks are autoregulatory or
feed-forward motifs. Tools to calculate topological network
parameters are presented in Table 3.
Modular approach
Based on the idea that biological systems are composed of
modules containing interacting components [55], a way to
achieve a better understanding of a complex network is to
break it down into simpler units called modules. A module
is often understood as a subset of vertices that are densely
connected among one another [56].
Commonly, in addition to closeness between nodes,
functional criteria are used to divide a network into mod-
ules. Similar proteins tend to be connected in molecular
networks, so distinct sets of proteins and their correspond-
ing interactions constitute different blocks underlying
common functions [57]. Therefore, the study of modules
could be equivalent to the study of functional units of the
malignant cell [58]. In Table 4, some modular-based tools
helpful to manage a complex PPIN are presented.
Data integration
Taking into account that cancer is a multi-factorial disease
involving diverse anomalies, the analysis of biological
networks integrating different types of molecular data can
lead to discovery of robust, specifi c and useful biomarkers
Table 4 Modular analysis tools
Computed parameters Input Distinctive features Webpage
Connected components,
neighbourhood modules,
hub-based modules,
cliques and cluster
Turn a network into an
interacting clusters
GO terms overrepre-
sentation in biological
List of interactions
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Tab-delimited, GML,
VisML, DOT and
adjancency matrix
Network charged in
Cytoscape’s environment
GraphWeb [59]
GenePRO [60]
MCODE [61]
BiNGO [62]
NEAT [63]
NEMO [64]
Performs a functional profi ling of discov-
ered modules based on GO annotations
Ref. [58] provides an accurate description
of algorithms underlying each clustering
Break down a network into functional
modules extracting them as independent
Cytoscape plugin
Displays a view of the clusters as individual
but interconnected nodes, maintaining the
whole-network picture
A previous hand-made defi nition of clusters
is necessary
Cytoscape plugin
Detects densely connected regions in a
Specifi cally oriented to the discovery
of molecular complexes
A set of nodes must be manually selected
from a network and BiNGO retrieves GO
terms associated to this set of proteins
Test the statistical signifi cance of the
enrichment and control the false
discovery rate
Divides the network into non-overlapping
Retrieve KEGG or MetaCyc pathways
in which proteins are implicated
Identify network communities based on
the premise that densely connected nodes
correspond to functional modules
of disease; and also shed light on the mechanisms and aeti-
ology of the studied tumour [65–68].
The representation of data derived from heterogeneous
sources in a unique network is a way to integrate diverse
and massive data sets. PPINs can integrate diverse mo-
lecular data to get a more complete model of the biological
system (Fig. 2). It has been postulated that proteins with
high connectivity within a network could be very impor-
Fig. 2 Data integration into a network to
obtain a more informative PPIN. Red and
green circles represent seed and linker pro-
teins respectively. Complementary mo-
lecular information: over-expression at the
mRNA level is indicated as a purple circle
and proteins with mutations at DNA are
represented as a half-moon shape, i.e., pro-
teins “a” and “d” are overexpressed and
connected by a protein not deregulated at
mRNA level, but mutated. Some of the pub-
lic tools useful to data integration appear in
the purple box
Table 5 Integration tools
Kind of integrated data Input Distinctive features Webpage
Subcellular location
Expression values
-omics experiments
results: expression
microarrays, aCGH, MS/
MS proteomics, GWAS
data, ChIP-chip experi-
ments, DNA methylation
assays or high-throughput
Experimental data
Network charged in
Cytoscape’s environment
and subcellular location
Network charged in
Cytoscape’s environment
and expression data
A network and high-
throughput results
A network and a
biochemical dataset
Cerebral [70]
Dynamic Expres-
sion Plugin [37]
EGAN [71]
Vanted [72]
Cytoscape’s plugin
It generates an intuitive view of the network
in which proteins appear separated into
layers according to the context of cell
Cerebral does not automatically search for
cellular location: this data must be provided
to Cerebral as a Cytoscape attribute
Cytoscape’s plugin
It colours the nodes in a range accord-
ing to their level of expression: from blue
(minimum expression) to red (maximum
Useful to easily identify down- or up-
regulated areas of the network
An expression data fi le must be charged in
Java application
It allows combining interaction and molec-
ular data in the context of network modules,
i.e., expression data: divide network into
topological modules (motifs) and then look
for co-expression patterns in each module,
divide network into functional modules
and then look for co-expression patterns,
or use expression information to divide the
network into co-expression modules
EGAN allows selecting nodes based on
crossing between different data: i.e., select
all genes with up-regulated expression and
amplifi ed copy number.
Easy to download and install Java applica-
tion (Windows, Mac or UNIX)
A tool specially designed to help scientists
with the interpretation of related experi-
mental data
tant to the studied disease, despite not being differentially
expressed. Thus, genes with a role in tumorigenesis not
detected in a high-throughput experiment could be iden-
tifi ed by a network-based approach. For example, if an
important protein is activated by phosphorylation, its gene
expression may not be altered, but the kinase that phospho-
rylates it will be up-regulated. So, even though no changes
in expression are observed when measuring the protein,
since that protein is connected to its kinase that is altered,
the network will reveal its importance. The same occurs
with mutated genes with a role in tumour progression not
detected by differential expression experiments, but usually
taking up a central position in networks [69].
Usually, a network contains false positive interactions
or interactions that are not working in the studied tissue.
Expression data could be used as a fi lter assuming that if
a gene is not expressed in such tissue, neither will its cor-
responding protein. Consequently, interactions containing
non-expressed genes are not real interactions. Several tools
for diverse data integration into a network are presented in
Table 5.
An example using genes classically related to colorectal
Figure 3 shows an example of how to use some of the
previously described tools to extract biological informa-
tion from the following 15 colorectal cancer (CRC) genes:
and KRAS. These seed proteins, classical key molecules
driving colon carcinogenesis, are a mix of chromosomal in-
stability (CIN) genes, microsatellite instability (MSI) genes
and CpG island methylation phenotype (CIMP) genes [73].
BIANA software was used to retrieve and export a fi le
containing experimentally determined interactions of the
seed proteins. Next, a visual representation of the resulting
network was performed using Cytoscape software (Fig.
3A). The PPIN showed two components, one called the
giant component, because it contained the higher number
of nodes, and a smaller independent network. The giant
component grouped all seed proteins except WNT and its
interacting partners. APC appeared central, directly inter-
acting with seed proteins AURKA, MAD2L1, CTNNB1,
BUB1 and AXIN2, and indirectly, through linker proteins,
with the remaining seeds except MLH1 (MSI representative
gen). KRAS and BRAF directly interacted with each other
since both are chosen as CIMP-related genes. The protocol
in Fig. 3B was followed to analyse this PPIN including a
topological approach, a clustering or modular approach
and a data integration step.
First, a topological exploration of the PPIN was made:
centrality measures of hub proteins were calculated us-
ing Hubba and NetworkAnalyzer software. Protein ranks
differed slightly depending on the algorithm used for the
analysis, but in all cases AURKA, EGFR and TP53 ap-
peared as the most central proteins in the network, indicat-
ing their biological relevance in the pathogenesis of CRC.
Interestingly, BRAF took up the second position when cen-
trality was measured in terms of maximum neighbourhood
component (MNC) but descended to the fourth position in
degree and sixth in betweenness. This means that though
BRAF does not have many interacting partners and is not
located in all paths crossing the PPIN, when the network
is divided into clusters of densely connected elements, its
appears in more clusters than other proteins such us EGFR
or TP53 (Fig. 3C). A network motif analysis was also done
with MAVisto software, revealing some repeated structures
of the network. Due to the computational requirements of
this complex task, this analysis was done on a small ver-
sion of the network (extracted with POINET software in-
stead of BIANA). As an example, this application revealed
as an important association the interaction between TP53
and the less studied protein RASA1 through the two link-
ers AURKA and CDKN2A (Fig. 3D). A search in PubMed
revealed that decreased expression of RASA1 is associated
with abnormal expression of TP53 in advanced colorectal
tumours [74]. However, motif results must be carefully
interpreted. This analysis is more suitable for directed net-
works (usually regulatory networks), in which directional-
ity of the interactions are represented.
Second, a clustering analysis was performed to look
for both functional modules and molecular complexes
with biological meaning. BINGO software highlighted that
“DNA-repair” (p=4.110–8) and “response to DNA dam-
age” (p=1.110–7) were the most representative GO terms
in the cluster grouping MLH1-interacting proteins. Also
“transmembrane receptor protein serine/threonine kinase
signalling pathway” (p=8.410–13) and “small GTPase
mediated signal transduction” (p=2.010–11) were the most
representative functions of Smad4-interacting proteins
(Fig. 3E). A betweenness centrality clustering analysis with
GraphWeb software effectively separated CIN and MSI
genes, and was also useful to discover biological pathways
inside the network: MLH1 and its interacting proteins
formed a module with statistically signifi cant enrichment
in the KEGG pathway “mismatch repair” (concordant with
BINGO results). BUB1, CDK1 and TGFBR2 defi ned a
module of interacting proteins enriched in “transforming
growth factor receptor signalling pathway”. The GO term
“Wnt receptor signalling pathway” included APC, CT-
NNB1 and AXIN2 (Fig 3F). So, although WNT1 intrigu-
ingly did not appear to interact with these proteins, this ap-
proach was able to capture the classical Wnt/beta-catenin
pathway in CIN CRC [75]. As an alternative approach,
MCODE was used to search for putative molecular com-
plexes. Four complexes were retrieved: the fi rst included
AURKA, MAD2L1 and its interacting proteins. The sec-
ond contained BRAF, EGFR and its linker proteins RIN1,
PKP2, RAPGEF1 and CRK. TP53, BUB1, HDAC5 and
PRKCA formed another complex. Lastly, a four-node com-
plex included the direct interaction between seed proteins
TGFBR2, SMAD4, and its linker proteins SMAD3 and
SMAD7 (Fig. 3G).
Finally, data integration was performed. Easily, using
Cytoscape software, nodes from the PPIN were merged
Fig. 3 Example of PPIN construction and analysis. A Visual representation (force directed layout) of the network using Cytoscape software. BI-
ANA software was chosen to construct a PPIN with only experimentally determined interactions, which resulted in 1466 nodes and 2176 edges.
The bottom right insert shows a reduction to MNC of the same PPIN. B Protocol followed to analyse the network: topological exploration,
clustering and data integration. C Centrality measures of the PPIN using Hubba (Degree and MNC) and NetworkAnalyzer (betweenness). Both
applications output a ranking of the proteins but differ in the graphical representation. Hubba uses a colour code to highlight the most centred
proteins in the PPIN (from red to blue). In NetworkAnalyzer the larger nodes represent the most centralised proteins. D Output of MAVisto soft-
ware. On the right, the description of all discovered motifs. On the left, black and white PPIN with network motifs represented in colour.
with a list of 202 differentially expressed genes between
cancerous and noncancerous colon tissues, extracted from
Bertucci et al. [76]. As a result, 37 proteins were found to
be deregulated at mRNA level (Fig. 3H). These included
some previously identifi ed as important hubs such us TP53
(over-expressed), reinforcing their critical role in colorectal
tumorigenesis. Among interacting proteins, this approach
allowed us to focus our attention on parts of the network
containing deregulated proteins such us TGFB3 (under-
expressed) or CDK2 (up-regulated) [77]. Moreover, a more
detailed analysis revealed that though crucial CRC proteins
such as APC did not appear differentially expressed (prob-
ably because it is a mutated but not differentially expressed
gene), some of its interacting proteins like PTK2 or SFN
were up-regulated. Specially, YWHAZ emerged as an
important protein at the crossroads of BRAF, EGFR and
Cerebral software was used to merge the network with
subcellular location information. A reduced PPIN that only
included seeds and linkers was used to obtain a clearer
picture. Proteins were placed into layers of predefi ned lo-
cations: extracellular region, plasma membrane, cytoplasm,
peroxisome, proteasome complex, mitochondrion, Golgi,
endoplasmic reticulum and nucleus. MLH1, located in the
nucleus, mainly interacted with proteins in the nucleus,
with the exception of TRIM29 and AP2B1, which are cy-
Fig. 3 (continuation) E Functional analysis using BINGO plugin for Cytoscape. In orange, nodes selected to analyze (in this example MLH1 and
SMAD4 connected proteins). Companion table shows the output including for each GO their ID, a description, a p-value, a corrected p-value
(Benjamini and Hochberg multiple testing correction), the cluster frequency, the total frequency and the genes included in that GO process. F
Division of the network into functional modules (GO terms and KEGG pathways) using GraphWeb. G Output of MCODE tool that looks for
molecular complexes. H Gene expression integration using Cytoscape. Over-expression represented in red and under-expression in green. Light
purple nodes show non-differential expression. On the right, the list of deregulated proteins. I Subcellular location classi cation of the MNC
PPIN using Cerebral plugin. MLH1 and APC are highlighted and individually represented
toplasmatic proteins. APC interacted with both nuclear and
cytoplasmatic proteins. This is probably due to the reported
nuclear-cytoplasmic shuttling of APC [78] (Fig. 3I).
Note of caution
An increasing number of specialised tools are appearing in
each area of network construction and analysis. We strong-
ly encourage researchers to search and explore other tools
beyond those described here.
It is also important not to forget that a network is just a
representation of the studied system, but not the real world.
Although valuable for hypothesis generation, biological
validation of the hypothesis derived from network analysis
is desirable. It is necessary to keep in mind that despite
huge efforts made in this area, the human interactome is not
completed. Well studied proteins have a higher probability
of being included in such a network, resulting in some se-
lection bias with respect to less studied proteins. Moreover,
it is well known that the human interactome contains false
positive interactions, so a careful interpretation of results
is required [79]. Lack of spatial-temporal information is
another obstacle to consider in the network elucidation
process: we assume that two proteins are always interacting
when actually they only work together in a certain tissue or
even organelle, or in a certain cell cycle time [80].
Otherwise, a network-centric approach remains incom-
plete because of the intrinsic complexity of cancer disease:
complex cross-talk among cancer cells [81] and with the
surrounding microenvironment [82] is not painted in PPIN,
which only represent interactions inside a single cancer
In the fi eld of cancer research, the combination of classical
techniques with systems biology and network tools can be
useful to generate more accurate biological hypotheses re-
garding therapy, prognosis or tumour classifi cation, bring-
ing us closer to personalised medicine.
However, despite the invaluable help of these tech-
niques, no software yet exists comparable to human brain.
A medical and biological point of view is needed for the
interpretation of complex networks.
Confl ict of interest The authors declare that they have no confl ict of
interest relating to the publication of this manuscript.
Acknowledgements This study was supported by the Catalan Insti-
tute of Oncology, the Private Foundation of the Biomedical Research
Institute of Bellvitge (IDIBELL), the Instituto de Salud Carlos III
(grants FIS PI08/1635, FIS PI08/1359, FIS 06/0545 and FIS 05/1006,
PI081359, PI08-1635, PI09-01037), CIBERESP CB07/02/2005, the
Spanish Association Against Cancer (AECC) Scientifi c Foundation,
the Catalan Government DURSI grant 2009SGR1489, and the Euro-
pean Commission grants FOOD-CT-2006-036224-HIWATE and FP7-
COOP-Health-2007-B HiPerDART.
