Content uploaded by Rebeca Sanz-Pamplona
Author content
All content in this area was uploaded by Rebeca Sanz-Pamplona on Feb 03, 2014
Content may be subject to copyright.
Abstract As cancer is a complex disease, the representa-
tion of a malignant cell as a protein-protein interaction
network (PPIN) and its subsequent analysis can provide
insight into the behaviour of cancer cells and lead to the
discovery of new biomarkers. The aim of this review is to
help life-science researchers without previous computer
programming skills to extract meaningful biological in-
formation from such networks, taking advantage of easy-
to-use, public bioinformatics tools. It is structured in four
parts: the fi rst section describes the pipeline of consecutive
steps from network construction to biological hypothesis
generation. The second part provides a repository of public,
user-friendly tools for network construction, visualisation
and analysis. Two different and complementary approaches
of network analysis are presented: the topological approach
studies the network as a whole by means of structural graph
theory, whereas the global approach divides the PPIN into
sub-graphs, or modules. In section three, some concepts
and tools regarding heterogeneous molecular data integra-
tion through a PPIN are described. Finally, the fourth part
is an example of how to extract meaningful biological in-
formation from a colorectal cancer PPIN using some of the
described tools.
Keywords Cancer · Systems biology · Protein-protein
interaction network · Public bioinformatics tools ·
Biomarker discovery
Introduction
Cancer is a complex disease in which many proteins, genes
and molecular processes are implicated [1]. Genes and
proteins do not work independently, but are organised into
co-regulated units that perform a common biological func-
tion. It is the alteration of these functional elements that
leads to the development of a particular cancer phenotype
(i.e., drug response or disease outcome) and, consequently,
their study cannot be tackled from the classical one-gene
approach. A systems biology approach, the analysis of the
molecular relationship between the implicated genes and
proteins as a whole, is required to understand the disease
phenotype [2–4].
In this scenario, cancer systems medicine emerges as a
translational extension of systems biology that meets the
clinical information and the -omics disciplines for the clas-
sifi cation and diagnosis of cancer subtypes, the prognosis
of patient outcomes, the prediction of treatment responses
and the identifi cation of perturbation targets for drug devel-
opment [5, 6].
Proteins interact with each other within a cell, and those
interactions can be represented by a network, defi ned as an
abstract representation of nodes or vertices (i.e., proteins)
R. Sanz-Pamplona · A. Berenguer · X. Sole · D. Cordero ·
M. Crous-Bou · J. Serra-Musach · E. Guinó · M.A. Pujana ·
V. Moreno (쾷)
Unit of Biomarkers and Susceptibility
Catalan Institute of Oncology (ICO)
Bellvitge Institute for Biomedical Research (IDIBELL)
Biomedical Research Centre Network for Epidemiology
and Public Health (CIBERESP)
Av. Gran Vía, 199
ES-08908 L’Hospitalet de Llobregat, Barcelona, Spain
e-mail: v.moreno@iconcologia.net
V. Moreno
Department of Clinical Sciences
Faculty of Medicine
University of Barcelona
Barcelona, Spain
Clin Transl Oncol (2012) 14:3-14
DOI 10.1007/s12094-012-0755-9
EDUCATIONAL SERIES Blue Series
Tools for protein-protein interaction network analysis in cancer research
Rebeca Sanz-Pamplona · Antoni Berenguer · Xavier Sole · David Cordero · Marta Crous-Bou · Jordi Serra-Musach ·
Elisabet Guinó · Miguel Ángel Pujana · Víctor Moreno
Received: 14 July 2011 / Accepted: 20 August 2011
ADVANCES IN TRANSLATIONAL ONCOLOGY
4
Clin Transl Oncol (2012) 14:3-14
where some pairs of nodes are connected by edges repre-
senting interactions [7]. With the recent advances in high-
throughput experimental technologies, increasing numbers
of large-scale biological networks are being defi ned [8,
9]. Network knowledge can give rise to understanding
the biological function and dynamic behaviour of cellular
systems, generating biological hypothesis about putative
biomarkers, therapeutic targets or deregulated pathways in
cancer [10–14].
Cancer-related proteins have a higher ratio of promis-
cuous structural domains, making them more prone to
interact with other proteins. In fact, they have a large num-
ber of interacting proteins and occupy a central position in
the networks [15]. Proteins interacting with cancer-related
proteins have a higher probability of being related with the
cancer process than non-interacting proteins. Hence, the
study of those proteins may be an effi cient way to discover
novel cancer genes and cancer biomarkers [16–18].
Since understanding complex networks representing a
cancer cell is one of the main challenges of today’s biol-
ogy, this review attempts to help life-science researchers
without previous computer programming skills to extract
meaningful biological information from such networks,
taking advantage of easy-to-use, public bioinformatics
tools. Though different types of biological networks exists
such us regulatory networks, signal transduction networks
or metabolic networks, here only protein–protein interac-
tion networks (PPINs) will be covered. In addition, due
to the complexity of directed (networks whose edges have
directional information) and dynamic networks (those
including changes along time), only undirected and static
PPINs will be discussed here [19].
This review is structured into four sections. The fi rst
section briefl y describes the workfl ow enumerating con-
secutive steps from network construction to hypothesis
generation. The second section details suitable tools to
carry out each step of the analysis. Some concepts about
data integration are described in the third section. Finally,
a fourth section presents an illustrative example of how to
use these tools, using colorectal cancer data. This is not a
compendium of all existing network-management tools but
a tutorial to construct and analyze protein interaction net-
works in a simple manner. It should be noted, however, that
a unique method does not exist and each particular network
may have characteristics requiring specifi c software. For
example, software suitable for dealing with huge graphs
may not be helpful for analysing small networks, and vice
versa. Also, although graph theory is beyond the aim of
this review, basic ideas to start dealing with interaction net-
works will be provided and the references would be helpful
for a more in-depth study of this topic.
Work-fl ow: from network assembly to hypothesis
generation
Figure 1 summarises five sequential steps required to
generate a biological hypothesis on cancer cell behaviour
through PPIN construction and analysis.
The starting point is to decide which proteins defi ne
the input, hereafter seed proteins (fi rst step). These should
be the molecules of major interest and will be the skeleton
of the PPIN. Typical choices are differentially expressed
Fig. 1 Pipeline. The process of PPIN con-
struction and analysis follows these con-
secutive steps. First, a list of molecules of
interest (seed proteins) is defined. Next,
their interactions are searched in a specia-
lised database and represented in a PPIN.
Then, the network is analysed and conse-
quently a biological hypothesis is generated.
In this diagram (steps 1–3), six seed pro-
teins (1–6) are represented in red whereas
their interacting proteins are represented in
green (“a” and “b”). Protein “a” interacts
with seeds “1”, “2” and “6”; protein “b”
interacts with seeds “3” and “4”; and seeds
“1” and “6” interact with each other. Seed
“5” has no interacting partners. Step 4 of the
chart shows two complementary network
analysis approaches: fi rst, the topological
methods, which look for essential nodes
into the architecture of the network. In this
example interacting protein “a” acts as a
hub because of their higher degree. Second,
modular methods divide network into sub-
graphs grouping proteins sharing a common
property. Some of the public tools useful in
each step of the analysis are also represented
in the fi gure
Clin Transl Oncol (2012) 14:3-14
5
molecules observed in a given experiment (transcriptomic
or proteomic) or molecules known to be involved in cancer.
The fi nal hypothesis derived from the network analysis will
be directly related to these seed proteins.
The second step is the retrieval of binary interactions.
Interacting partners of seed proteins need to be identifi ed
from curated databases. Several publicly available data-
bases exist: HPRD [20], String [21], DIP [22] and others
[23]. A description of the experimental and computational
procedures to obtain these protein-protein interaction data
is beyond the objectives of this review (see Refs. [24] and
[25] for more information).
The third step is network construction and visu-
alisation. From the set of protein–protein interactions, the
construction of a graph consists of assigning vertices to
proteins and edges to interactions between proteins. Then,
several algorithms allow the creation of a visual represen-
tation of the network [26].
The fourth step is network analysis, when meaningful
biological information extraction is done using bioinfor-
matics methods. Two complementary approaches in the
area of network analysis exist: topological (study of the
whole graph) and modular (division of networks into mod-
ules of related proteins) [27].
As a result, derived from network construction and
analysis, a hypothesis generation (step 5) regarding the
initial data is desirable. Ideally, a topological network anal-
ysis usually identifi es proteins susceptible to be biomarkers
or therapeutic targets, whereas a modular approach gives
information about deregulated functions or pathways.
Public tools for network management
Multiple public network management tools exist. An ex-
haustive review published few years ago identifi ed no less
than 35 and the number continues to grow exponentially
[28]. In this review only some of the best known and/or
easier to use will be discussed, but it is strongly recom-
mended to explore other tools, some of which might be
useful for specifi c topics.
Table 1 Construction tools
Kind of interactions Included databases Input Distinctive features Webpage
Experimentally
determined and
predictions
Experimentally
determined and
predictions
Only experimentally
determined
Experimentally
determined and
predictions
String, Intact,
DIP, Degg, IPI,
SCOP, UniProt,
Reactome, MINT,
cog and psi_mi
DIP, MINT,
BIND, HPRD,
MIPS, CYGD
BioGRID and
NCBI
HPRD, IntAct,
BIND, DIP and
MINT
MDC_Y2H,
CCSB, HPRD,
DIP, BIND,
IntAct, BioGRID,
COCIT, REAC-
TOME, ORTHO,
HOMOMINT
and OPHID
BIANA [31]
Poinet [32]
SNOW [33]
UniHI [34]
Accepts a variety
of identifi ers (Uni-
Prot, Ensembl,
GeneSymbol…)
NCBI or UniProt
identifi ers
Gen, transcript or
protein
Entrez Gene,
GeneSymbol,
UniProt, NCBI,
Ensembl, RefSeq,
BioGrid, HPRD,
OMIM
• On-line interface
• Flexibility: It is possible to choose the
network level, the relation types, restrict
interactions by method and add interologs
• The output could be downloading and
visualised in Cytoscape
• By default, only experimentally
determined interactions were retrieved
• It is possible to fi lter interactions based
on the number of shared GO terms
between the two interacting proteins
• PPI could be fi ltered with tissue-specifi c
expression data from public resources
• The output will be directly visualised
with POINET or be downloaded and
visualised in Cytoscape
• Construct minimal connected network
(MCN); a graph containing only seed
and linker proteins
• Maps seed proteins onto an interactome
of reference calculating the network
parameters degree, clustering coeffi cient
and betweenness
• Uses Human Gene Atlas data to construct
tissue-specifi c interaction networks
• Annotate networks with pathway
information from KEGG database
• Only accept a maximum of 50 proteins
as an input
http://sbi.imim.es/web/BIANA.
php
http://poinet.bioinformatics.tw
http://snow.bioinfo.cipf.es/cgi-
bin/snow.cgi
http://theoderich.fb3.mdc-
berlin.de:8080/unihi/home.jsp
6
Clin Transl Oncol (2012) 14:3-14
Network management software typically specialises in
construction, visualisation or analysis steps. However, this
is an artifi cial classifi cation and overlapping is common:
some tools are useful for several or all steps.
Construction tools
Once the list of seed proteins is ready, the fi rst step con-
sists in joining them together through linkers, proteins that
bind with two or more seed proteins working as bridges.
The number of linker proteins inserted between two seed
proteins determines the network distance or network level
[29]. A distance one is recommended since distance two
usually retrieves an undesirable “ball of yarn” network.
Moreover, at this point of the analysis it is crucial to de-
cide the nature of the interactions that will be included in
the analysis: experimentally and/or computationally de-
termined. Literature-based interactions are more reliable
but biased towards networks of the better studied proteins
and less likely to discover new interesting interactions.
Computational-inferred interactions from high-throughput
experiments do not have this bias, but result in a higher
rate of false interactions being included, so a more careful
interpretation of the data and a subsequent experimental
validation are desirable [30].
Tools exist that look for binary interactions in special-
ised databases and automatically retrieve a PPIN. Some of
the more popular ones are summarised in Table 1.
Visualisation tools
Given a complex system under study, one natural goal is
to create a graphical representation of the system as a net-
work in which nodes represent proteins and edges interac-
tions between proteins [7]. Creating this representation is
not trivial work and sometimes it drives the interpretation
of the system and the hypothesis derived. Diverse layouts
exist to represent a network such as circular, hierarchical or
force-directed [28]. These can be drawn with network visu-
alisation tools (see Ref. [35] for a review). Three of them
have been summarised in Table 2.
Analytical tools
In order to extract underlying biological information from
the PPIN, it is necessary to analyse it using graph-theo-
retic tools. Two different and complementary approaches,
named topological and modular, have been developed for
the study of a complex network. The topological approach
studies the network as a whole by means of the analysis of
the structural parameters of the graph. Instead, the modular
approach divides the PPIN into modules that group nodes
based on a common characteristic such as sharing the same
function or belonging to the same pathway. Afterwards,
each module is studied separately [27].
Topological approaches: centrality measures and network
motifs
The description of the structural characteristics of a net-
work is often the first step in the analysis of network
data [40]. Biological networks including PPIN are usually
scale-free, meaning that a few nodes are highly connected
(“hubs”) and a majority of nodes are linked to only one or
a few neighbours [41, 42]. According to the lethality and
centrality rule, nodes that have a major number of connec-
tions are those that play a more important role in the archi-
tecture of the PPIN and tend to be biologically relevant in
the studied system [43]. In other words, highly connected
proteins are essential to organism viability [44]. It has also
been demonstrated that genes traditionally associated with
cancer are implicated in multiple cellular processes and
Table 2 Visualisation tools
Usage Input Distinctive features Webpage
Ease-to-download and
install Java applica-
tion (Windows, Mac or
UNIX)
The software can be
downloaded or directly
run from the web page
Java application
Table of interactions
(.xls or .txt) Multiple fi le
types (.xml, .rdf, .owl,
.gml, .xgmml, .sif, .sbml)
List of interactions in .txt
format
List of interactions
retrieved from STRING
database
Cytoscape [36, 37]
Arena3D [38]
MEDUSA [39]
• The most popular visualisation tool
• Allows a variety of graph customisation
• Useful to integrate biomolecular networks
into a unifi ed framework
• Cytoscape functionality can be expanded
using the collection of plugins developed
by Cytoscape’s community of users
• 3D view of the network
• Is recommendable to use a graphic card
with hardware-accelerated 3D graphics and
at least 256 MB of graphical memory
• It was specially designed and optimised for
accessing protein interaction data
from STRING database
http://www.cytoscape.org
http://www.cytoscape.org/
plugins2.php
http://arena3d.org
http://coot.embl.de/medusa
Clin Transl Oncol (2012) 14:3-14
7
signalling pathways, so they often work as protein hubs
inside an interaction network [45].
Identifying essential hubs in the PPIN is a way to
decipher the critical players inside the complex network.
Network centrality measures can be used to rank the nodes
of a given network and find the most important nodes,
hypothetically useful as biomarkers or therapeutic targets
[46]. The identification of central elements in biologi-
cal networks may also provide new hypotheses that lead
to more rational approaches in experimental design [47].
Several centrality measures exist that should be considered
within an exploratory process. The most important ones are
degree, betweenness, closeness and eigenvector centrality.
See Refs. [48] and [49] for a more in-depth explanation of
these concepts.
Network motif distribution is another useful measure. A
motif is a basic building block of complex graphs defi ned
as a sub-network or connectivity pattern that appears in
Table 3 Topological analysis tools
Computed parameters Input Distinctive features Webpage
Degree, bottleneck, edge
percolated component,
subgraph centrality,
maximum neighbourhood
component and density of
maximum neighbourhood
component
Degree, eccentric-
ity, closeness, radiality,
centroid value, stress,
S.P. Betweenness,
C.-F. Closeness, C.-F.
Betweenness, Katz
Status, Eigenvector, Hub-
bell index, Bargaining,
PageRank, HITS-Hubs,
HITS-Authorities and
Closeness-vitality
Number of nodes and
edges, self-loops,
connected components,
average number of neigh-
bours, network diameter,
radius, density, cen-
tralisation, heterogeneity,
clustering coeffi cient,
number of shortest paths
and the characteristic
path length
Motifs
Motifs
List of interactions in .txt
format
Network data in .net,
.tab, .mat or .xml format
Network charged in
Cytoscape environment
List of interactions in .txt
format
Network
Hubba [47]
Centibin [48]
NetworkAnalyzer
[52]
MAVisto [53]
FANMOD [54]
• Web-based tool
• The appropriate tool to just rank proteins
in a network by centrality measures
• Free installable Windows application
• Useful for a detailed centrality study be-
cause offers more algorithms than
the other tools
• Java plugin for Cytoscape
• Displays a comprehensive set of
topological parameters
• It is possible to visualise different param-
eters in the same network by changing
node’s features (i.e., “degree” in colour and
“closeness centrality” in size)
• Motifs were detected by comparing the
frequency of all occurrences of a motif in
the studied network to the frequency values
of this motif in randomisations of the same
network
• MAVisto presents several presentations of
their results: a motif table (with p-value and
z-score), a motif view, a motif fi ngerprint
and a visualisation of motif matches in the
network
• Computationally time consuming
• Motifs were detected and grouped intomotif
classes. Then, an algorithm determines
which motif classes are displayed at much
higher frequency than in random graphs
• Faster than MAVisto
http://hub.iis.sinica.edu.tw/
Hubba
http://centibin.ipk-gatersleben.
de/
http://med.bioinf.mpi-inf.mpg.
de/networkanalyzer/
http://mavisto.ipk-gatersleben.de
http://www.minet.unijena.
de/~wernicke/motifs
8
Clin Transl Oncol (2012) 14:3-14
a PPIN at a signifi cantly higher frequency than would be
expected for a random network [50]. The distribution of
motifs characterises the local structure of networks and has
also been shown to be functionally relevant [51]. Despite
the high complexity involved in the detection of network
motifs, in practice the search can be executed in reasonable
time using available software. Typical motifs that repeat-
edly appear in regulatory networks are autoregulatory or
feed-forward motifs. Tools to calculate topological network
parameters are presented in Table 3.
Modular approach
Based on the idea that biological systems are composed of
modules containing interacting components [55], a way to
achieve a better understanding of a complex network is to
break it down into simpler units called modules. A module
is often understood as a subset of vertices that are densely
connected among one another [56].
Commonly, in addition to closeness between nodes,
functional criteria are used to divide a network into mod-
ules. Similar proteins tend to be connected in molecular
networks, so distinct sets of proteins and their correspond-
ing interactions constitute different blocks underlying
common functions [57]. Therefore, the study of modules
could be equivalent to the study of functional units of the
malignant cell [58]. In Table 4, some modular-based tools
helpful to manage a complex PPIN are presented.
Data integration
Taking into account that cancer is a multi-factorial disease
involving diverse anomalies, the analysis of biological
networks integrating different types of molecular data can
lead to discovery of robust, specifi c and useful biomarkers
Table 4 Modular analysis tools
Computed parameters Input Distinctive features Webpage
Connected components,
neighbourhood modules,
hub-based modules,
cliques and cluster
modules
Turn a network into an
interacting clusters
Clusters
GO terms overrepre-
sentation in biological
networks
Clusters
Modules
List of interactions
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Network charged in
Cytoscape’s environment
Tab-delimited, GML,
VisML, DOT and
adjancency matrix
format
Network charged in
Cytoscape’s environment
GraphWeb [59]
GenePRO [60]
MCODE [61]
BiNGO [62]
NEAT [63]
NEMO [64]
• Performs a functional profi ling of discov-
ered modules based on GO annotations
• Ref. [58] provides an accurate description
of algorithms underlying each clustering
method
• Break down a network into functional
modules extracting them as independent
sub-networks
• Cytoscape plugin
• Displays a view of the clusters as individual
but interconnected nodes, maintaining the
whole-network picture
• A previous hand-made defi nition of clusters
is necessary
• Cytoscape plugin
• Detects densely connected regions in a
network
• Specifi cally oriented to the discovery
of molecular complexes
• A set of nodes must be manually selected
from a network and BiNGO retrieves GO
terms associated to this set of proteins
• Test the statistical signifi cance of the
enrichment and control the false
discovery rate
• Divides the network into non-overlapping
clusters
• Retrieve KEGG or MetaCyc pathways
in which proteins are implicated
• Identify network communities based on
the premise that densely connected nodes
correspond to functional modules
http://biit.cs.ut.ee/graphweb/
http://wodaklab.org/genepro/
http://baderlab.org/Software/
MCODE
http://www.psb.ugent.be/cbd/
papers/BiNGO/Home.html
http://rsat.ulb.ac.be/rsat/in-
dex_neat.html
http://baderlab.bme.jhu.edu/
baderlab/index.php/NeMo
Clin Transl Oncol (2012) 14:3-14
9
of disease; and also shed light on the mechanisms and aeti-
ology of the studied tumour [65–68].
The representation of data derived from heterogeneous
sources in a unique network is a way to integrate diverse
and massive data sets. PPINs can integrate diverse mo-
lecular data to get a more complete model of the biological
system (Fig. 2). It has been postulated that proteins with
high connectivity within a network could be very impor-
Fig. 2 Data integration into a network to
obtain a more informative PPIN. Red and
green circles represent seed and linker pro-
teins respectively. Complementary mo-
lecular information: over-expression at the
mRNA level is indicated as a purple circle
and proteins with mutations at DNA are
represented as a half-moon shape, i.e., pro-
teins “a” and “d” are overexpressed and
connected by a protein not deregulated at
mRNA level, but mutated. Some of the pub-
lic tools useful to data integration appear in
the purple box
Table 5 Integration tools
Kind of integrated data Input Distinctive features Webpage
Subcellular location
Expression values
-omics experiments
results: expression
microarrays, aCGH, MS/
MS proteomics, GWAS
data, ChIP-chip experi-
ments, DNA methylation
assays or high-throughput
sequencing
Experimental data
Network charged in
Cytoscape’s environment
and subcellular location
data
Network charged in
Cytoscape’s environment
and expression data
A network and high-
throughput results
A network and a
biochemical dataset
Cerebral [70]
Dynamic Expres-
sion Plugin [37]
EGAN [71]
Vanted [72]
• Cytoscape’s plugin
• It generates an intuitive view of the network
in which proteins appear separated into
layers according to the context of cell
organelles
• Cerebral does not automatically search for
cellular location: this data must be provided
to Cerebral as a Cytoscape attribute
• Cytoscape’s plugin
• It colours the nodes in a range accord-
ing to their level of expression: from blue
(minimum expression) to red (maximum
expression)
• Useful to easily identify down- or up-
regulated areas of the network
• An expression data fi le must be charged in
Cytoscape
• Java application
• It allows combining interaction and molec-
ular data in the context of network modules,
i.e., expression data: divide network into
topological modules (motifs) and then look
for co-expression patterns in each module,
divide network into functional modules
and then look for co-expression patterns,
or use expression information to divide the
network into co-expression modules
• EGAN allows selecting nodes based on
crossing between different data: i.e., select
all genes with up-regulated expression and
amplifi ed copy number.
• Easy to download and install Java applica-
tion (Windows, Mac or UNIX)
• A tool specially designed to help scientists
with the interpretation of related experi-
mental data
http://www.pathogenomics.ca/
cerebral/
http://chianti.ucsd.edu/svn/
csplugins/trunk/ucsf/scooter/
dynamicXpr/
http://akt.ucsf.edu/EGAN/
http://vanted.ipk-gatersleben.de/
10
Clin Transl Oncol (2012) 14:3-14
tant to the studied disease, despite not being differentially
expressed. Thus, genes with a role in tumorigenesis not
detected in a high-throughput experiment could be iden-
tifi ed by a network-based approach. For example, if an
important protein is activated by phosphorylation, its gene
expression may not be altered, but the kinase that phospho-
rylates it will be up-regulated. So, even though no changes
in expression are observed when measuring the protein,
since that protein is connected to its kinase that is altered,
the network will reveal its importance. The same occurs
with mutated genes with a role in tumour progression not
detected by differential expression experiments, but usually
taking up a central position in networks [69].
Usually, a network contains false positive interactions
or interactions that are not working in the studied tissue.
Expression data could be used as a fi lter assuming that if
a gene is not expressed in such tissue, neither will its cor-
responding protein. Consequently, interactions containing
non-expressed genes are not real interactions. Several tools
for diverse data integration into a network are presented in
Table 5.
An example using genes classically related to colorectal
cancer
Figure 3 shows an example of how to use some of the
previously described tools to extract biological informa-
tion from the following 15 colorectal cancer (CRC) genes:
APC, BUB1, MAD2L1, TP53, PI3KCA, EGFR, AURKA,
CTNNB1, SMAD4, WNT1, AXIN2, TGFBR2, MLH1, BRAF
and KRAS. These seed proteins, classical key molecules
driving colon carcinogenesis, are a mix of chromosomal in-
stability (CIN) genes, microsatellite instability (MSI) genes
and CpG island methylation phenotype (CIMP) genes [73].
BIANA software was used to retrieve and export a fi le
containing experimentally determined interactions of the
seed proteins. Next, a visual representation of the resulting
network was performed using Cytoscape software (Fig.
3A). The PPIN showed two components, one called the
giant component, because it contained the higher number
of nodes, and a smaller independent network. The giant
component grouped all seed proteins except WNT and its
interacting partners. APC appeared central, directly inter-
acting with seed proteins AURKA, MAD2L1, CTNNB1,
BUB1 and AXIN2, and indirectly, through linker proteins,
with the remaining seeds except MLH1 (MSI representative
gen). KRAS and BRAF directly interacted with each other
since both are chosen as CIMP-related genes. The protocol
in Fig. 3B was followed to analyse this PPIN including a
topological approach, a clustering or modular approach
and a data integration step.
First, a topological exploration of the PPIN was made:
centrality measures of hub proteins were calculated us-
ing Hubba and NetworkAnalyzer software. Protein ranks
differed slightly depending on the algorithm used for the
analysis, but in all cases AURKA, EGFR and TP53 ap-
peared as the most central proteins in the network, indicat-
ing their biological relevance in the pathogenesis of CRC.
Interestingly, BRAF took up the second position when cen-
trality was measured in terms of maximum neighbourhood
component (MNC) but descended to the fourth position in
degree and sixth in betweenness. This means that though
BRAF does not have many interacting partners and is not
located in all paths crossing the PPIN, when the network
is divided into clusters of densely connected elements, its
appears in more clusters than other proteins such us EGFR
or TP53 (Fig. 3C). A network motif analysis was also done
with MAVisto software, revealing some repeated structures
of the network. Due to the computational requirements of
this complex task, this analysis was done on a small ver-
sion of the network (extracted with POINET software in-
stead of BIANA). As an example, this application revealed
as an important association the interaction between TP53
and the less studied protein RASA1 through the two link-
ers AURKA and CDKN2A (Fig. 3D). A search in PubMed
revealed that decreased expression of RASA1 is associated
with abnormal expression of TP53 in advanced colorectal
tumours [74]. However, motif results must be carefully
interpreted. This analysis is more suitable for directed net-
works (usually regulatory networks), in which directional-
ity of the interactions are represented.
Second, a clustering analysis was performed to look
for both functional modules and molecular complexes
with biological meaning. BINGO software highlighted that
“DNA-repair” (p=4.110–8) and “response to DNA dam-
age” (p=1.110–7) were the most representative GO terms
in the cluster grouping MLH1-interacting proteins. Also
“transmembrane receptor protein serine/threonine kinase
signalling pathway” (p=8.410–13) and “small GTPase
mediated signal transduction” (p=2.010–11) were the most
representative functions of Smad4-interacting proteins
(Fig. 3E). A betweenness centrality clustering analysis with
GraphWeb software effectively separated CIN and MSI
genes, and was also useful to discover biological pathways
inside the network: MLH1 and its interacting proteins
formed a module with statistically signifi cant enrichment
in the KEGG pathway “mismatch repair” (concordant with
BINGO results). BUB1, CDK1 and TGFBR2 defi ned a
module of interacting proteins enriched in “transforming
growth factor receptor signalling pathway”. The GO term
“Wnt receptor signalling pathway” included APC, CT-
NNB1 and AXIN2 (Fig 3F). So, although WNT1 intrigu-
ingly did not appear to interact with these proteins, this ap-
proach was able to capture the classical Wnt/beta-catenin
pathway in CIN CRC [75]. As an alternative approach,
MCODE was used to search for putative molecular com-
plexes. Four complexes were retrieved: the fi rst included
AURKA, MAD2L1 and its interacting proteins. The sec-
ond contained BRAF, EGFR and its linker proteins RIN1,
PKP2, RAPGEF1 and CRK. TP53, BUB1, HDAC5 and
PRKCA formed another complex. Lastly, a four-node com-
plex included the direct interaction between seed proteins
Clin Transl Oncol (2012) 14:3-14
11
TGFBR2, SMAD4, and its linker proteins SMAD3 and
SMAD7 (Fig. 3G).
Finally, data integration was performed. Easily, using
Cytoscape software, nodes from the PPIN were merged
A
B
D
C
Fig. 3 Example of PPIN construction and analysis. A Visual representation (force directed layout) of the network using Cytoscape software. BI-
ANA software was chosen to construct a PPIN with only experimentally determined interactions, which resulted in 1466 nodes and 2176 edges.
The bottom right insert shows a reduction to MNC of the same PPIN. B Protocol followed to analyse the network: topological exploration,
clustering and data integration. C Centrality measures of the PPIN using Hubba (Degree and MNC) and NetworkAnalyzer (betweenness). Both
applications output a ranking of the proteins but differ in the graphical representation. Hubba uses a colour code to highlight the most centred
proteins in the PPIN (from red to blue). In NetworkAnalyzer the larger nodes represent the most centralised proteins. D Output of MAVisto soft-
ware. On the right, the description of all discovered motifs. On the left, black and white PPIN with network motifs represented in colour.
12
Clin Transl Oncol (2012) 14:3-14
with a list of 202 differentially expressed genes between
cancerous and noncancerous colon tissues, extracted from
Bertucci et al. [76]. As a result, 37 proteins were found to
be deregulated at mRNA level (Fig. 3H). These included
some previously identifi ed as important hubs such us TP53
(over-expressed), reinforcing their critical role in colorectal
tumorigenesis. Among interacting proteins, this approach
allowed us to focus our attention on parts of the network
containing deregulated proteins such us TGFB3 (under-
expressed) or CDK2 (up-regulated) [77]. Moreover, a more
detailed analysis revealed that though crucial CRC proteins
such as APC did not appear differentially expressed (prob-
ably because it is a mutated but not differentially expressed
gene), some of its interacting proteins like PTK2 or SFN
were up-regulated. Specially, YWHAZ emerged as an
important protein at the crossroads of BRAF, EGFR and
TP53.
Cerebral software was used to merge the network with
subcellular location information. A reduced PPIN that only
included seeds and linkers was used to obtain a clearer
picture. Proteins were placed into layers of predefi ned lo-
cations: extracellular region, plasma membrane, cytoplasm,
peroxisome, proteasome complex, mitochondrion, Golgi,
endoplasmic reticulum and nucleus. MLH1, located in the
nucleus, mainly interacted with proteins in the nucleus,
with the exception of TRIM29 and AP2B1, which are cy-
Fig. 3 (continuation) E Functional analysis using BINGO plugin for Cytoscape. In orange, nodes selected to analyze (in this example MLH1 and
SMAD4 connected proteins). Companion table shows the output including for each GO their ID, a description, a p-value, a corrected p-value
(Benjamini and Hochberg multiple testing correction), the cluster frequency, the total frequency and the genes included in that GO process. F
Division of the network into functional modules (GO terms and KEGG pathways) using GraphWeb. G Output of MCODE tool that looks for
molecular complexes. H Gene expression integration using Cytoscape. Over-expression represented in red and under-expression in green. Light
purple nodes show non-differential expression. On the right, the list of deregulated proteins. I Subcellular location classifi cation of the MNC
PPIN using Cerebral plugin. MLH1 and APC are highlighted and individually represented
Clin Transl Oncol (2012) 14:3-14
13
toplasmatic proteins. APC interacted with both nuclear and
cytoplasmatic proteins. This is probably due to the reported
nuclear-cytoplasmic shuttling of APC [78] (Fig. 3I).
Note of caution
An increasing number of specialised tools are appearing in
each area of network construction and analysis. We strong-
ly encourage researchers to search and explore other tools
beyond those described here.
It is also important not to forget that a network is just a
representation of the studied system, but not the real world.
Although valuable for hypothesis generation, biological
validation of the hypothesis derived from network analysis
is desirable. It is necessary to keep in mind that despite
huge efforts made in this area, the human interactome is not
completed. Well studied proteins have a higher probability
of being included in such a network, resulting in some se-
lection bias with respect to less studied proteins. Moreover,
it is well known that the human interactome contains false
positive interactions, so a careful interpretation of results
is required [79]. Lack of spatial-temporal information is
another obstacle to consider in the network elucidation
process: we assume that two proteins are always interacting
when actually they only work together in a certain tissue or
even organelle, or in a certain cell cycle time [80].
Otherwise, a network-centric approach remains incom-
plete because of the intrinsic complexity of cancer disease:
complex cross-talk among cancer cells [81] and with the
surrounding microenvironment [82] is not painted in PPIN,
which only represent interactions inside a single cancer
cell.
Conclusions
In the fi eld of cancer research, the combination of classical
techniques with systems biology and network tools can be
useful to generate more accurate biological hypotheses re-
garding therapy, prognosis or tumour classifi cation, bring-
ing us closer to personalised medicine.
However, despite the invaluable help of these tech-
niques, no software yet exists comparable to human brain.
A medical and biological point of view is needed for the
interpretation of complex networks.
Confl ict of interest The authors declare that they have no confl ict of
interest relating to the publication of this manuscript.
Acknowledgements This study was supported by the Catalan Insti-
tute of Oncology, the Private Foundation of the Biomedical Research
Institute of Bellvitge (IDIBELL), the Instituto de Salud Carlos III
(grants FIS PI08/1635, FIS PI08/1359, FIS 06/0545 and FIS 05/1006,
PI081359, PI08-1635, PI09-01037), CIBERESP CB07/02/2005, the
Spanish Association Against Cancer (AECC) Scientifi c Foundation,
the Catalan Government DURSI grant 2009SGR1489, and the Euro-
pean Commission grants FOOD-CT-2006-036224-HIWATE and FP7-
COOP-Health-2007-B HiPerDART.
References
1. Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lan-
kelma J (2006) Cancer: a Systems Biology dis-
ease. Biosystems 83:81–90
2. Kitano H (2002) Systems biology: a brief over-
view. Science 295:1662–1664
3. Kreeger PK, Lauffenburger DA (2010) Cancer
systems biology: a network modeling perspective.
Carcinogenesis 31:2–8
4. Wang E, Lenferink A, O’Connor-McCourt M
(2007) Cancer systems biology: exploring cancer-
associated genes on cellular networks. Cell Mol
Life Sci 64:1752–1762
5. Auffray C, Chen Z, Hood L (2009) Systems medi-
cine: the future of medical genomics and health-
care. Genome Med 1:2
6. Clermont G, Auffray C, Moreau Y et al (2009)
Bridging the gap between systems biology and
medicine. Genome Med 1:88
7. Alberghina L, Höfer T, Vanoni M (2009) Mo-
lecular networks and system-level properties. J
Biotechnol 144:224–233
8. Stelzl U, Worm U, Lalowski M et al (2005) A hu-
man protein–protein interaction network: a resource
for annotating the proteome. Cell 122:957–968
9. Ramani AK, Bunescu RC, Mooney RJ, Marcotte
EM (2005) Consolidating the set of known human
protein–protein interactions in preparation for
large-scale mapping of the human interactome.
Genome Biol 6:R40
10. Kann MG (2007) Protein interactions and disease:
computational approaches to uncover the etiology
of diseases. Brief Bioinform 8:333–346
11. Baudot A, Gómez-López G, Valencia A (2009)
Translational disease interpretation with molecu-
lar networks. Genome Biol 10:221. Review
12. Wu Z, Zhao X, Chen L (2009) Identifying respon-
sive functional modules from protein–protein in-
teraction network. Mol Cells 27:271–277. Review
13. Taylor IW, Linding R, Warde-Farley D et al
(2009) Dynamic modularity in protein interac-
tion networks predicts breast cancer outcome. Nat
Biotechnol 27:199–204
14. Wang YC, Chen BS (2011) A network-based bio-
marker approach for molecular investigation and
diagnosis of lung cancer. BMC Med Genom 4:2
15. Jonson PF, Bates PA (2006) Global topological
features of cancer proteins in the human interac-
tome. Bioinformatics 22:2291–2297
16. Xu J, Li Y (2006) Discovering disease-genes by
topological features in human protein–protein
networks. Bioinformatics 22:2800–2805
17. Sanz-Pamplona R, Aragüés R, Driouch K et al
(2011) Expression of endoplasmic reticulum stress
proteins is a candidate marker of brain metastasis
in both ErbB-2(+) and ErbB-2(–) primary breast
tumors. Am J Pathol 179:564–579
18. Pujana MA, Han JD, Starita LM et al (2007)
Network modeling links breast cancer suscep-
tibility and centrosome dysfunction. Nat Genet
39:1338–1349
19. Junker BH, Schreiber F (2007) Analysis of bio-
logical networks. Chapter 3: Graph theory. John
Wiley & Sons, Hoboken, NJ, USA
20. Keshava Prasad TS, Goel R, Kandasamy K et al
(2009) Human Protein Reference Database: 2009
update. Nucleic Acids Res 37:D767–772
21. von Mering C, Huynen M, Jaeggi D et al (2003)
STRING: a database of predicted functional as-
sociations between proteins. Nucleic Acids Res
31:258–261
22. Xenarios I, Salwínski L, Duan XJ et al (2002)
DIP, the Database of Interacting Proteins: a re-
search tool for studying cellular networks of pro-
tein interactions. Nucleic Acids Res 30:303–305
23. Lehne B, Schlitt T (2009) Protein–protein interac-
tion databases: keeping up with growing interac-
tomes. Hum Genomics 3:291–297
24. Berggård T, Linse S, James P (2007) Methods for
the detection and analysis of protein protein inter-
actions. Proteomics 7:2833–2842. Review
25. Shoemaker BA, Panchenko AR (2007) Decipher-
ing protein–protein interactions. Part II. Compu-
tational methods to predict protein and domain
interaction partners. PLoS Comput Biol 3:e43.
Review
26. Kolaczyk E (2009) Mapping networks. In: Statis-
tical analysis of network data. Springer
27. Huang S (2004) Back to the biology in systems
biology: what can we learn from biomolecular
networks? Brief Funct Genomic Proteomic 2:279–
297
28. Suderman M, Hallett M (2007) Tools for visually
exploring biological networks. Bioinformatics
23:2651–2659. Review
29. Dorogovtsev SN, Mendes JF, Samukhin AN
(2001) Size-dependent degree distribution of a
scale- free growing network. Phys Rev E Stat
Nonlin Soft Matter Phys 63:062101
30. Oti M, Snel B, Huynen MA, Brunner HG (2006)
Predicting disease genes using protein–protein
interactions. J Med Genet 43:691
31. Garcia-Garcia J, Guney E, Aragues R et al (2010)
Biana: a software framework for compiling bio-
logical interactions and analyzing networks. BMC
Bioinform 11:56
32. Lee SA, Chan CH, Chen TC et al (2009) POINeT:
protein interactome with sub-network analysis
and hub prioritization. BMC Bioinform 10:114
33. Minguez P, Götz S, Montaner D et al (2009)
14
Clin Transl Oncol (2012) 14:3-14
SNOW, a web-based tool for the statistical analy-
sis of protein–protein interaction networks. Nucle-
ic Acids Res 37:W109–114
34. Chaurasia G, Iqbal Y, Hänig C et al (2007) UniHI:
an entry gate to the human protein interactome.
Nucleic Acids Res 35:D590–594
35. Pa vlopoulos GA, Wegener AL, Schneider R
(2008) A survey of visualization tools for biologi-
cal network analysis. BioData Min 1:12
36. Shannon P, Markiel A, Ozier O et al (2003) Cy-
toscape: a software environment for integrated
models of biomolecular interaction networks.
Genome Res 13:2498–2504
37. Killcoyne S, Carter GW, Smith J, Boyle J (2009)
Cytoscape: a community-based framework for net-
work modeling. Methods Mol Biol 563:219–239
38. Pavlopoulos GA, O’Donoghue SI, Satagopam VP
et al (2008) Arena3D: visualization of biological
networks in 3D. BMC Syst Biol 2:104
39. Hooper SD, Bork P (2005) Medusa: a simple tool
for interaction graph analysis. Bioinformatics
21:4432–4433
40. Assenov Y, Ramírez F, Schelhorn SE et al (2008)
Computing topological parameters of biological
networks. Bioinformatics 24:282–284
41. Barabasi AL, Albert R (1999) Emergence of scal-
ing in random networks. Science 286:509–512
42. Barabási AL, Bonabeau E (2003) Scale-free net-
works. Sci Am 288:60–69
43. Goh KI, Cuskick ME, Valle D et al (2007) The
human disease network. Proc Natl Acad Sci U S
A 104:8685–8690
44. He X, Zhang J (2006) Why do hubs tend to be es-
sential in protein networks? PLoS Genet 2:e88
45. Kar G, Gursoy A, Keskin O (2009) Human cancer
protein–protein interaction network: a structural
perspective. PLoS Comput Biol 5:e1000601
46. Chen J, Aronow BJ, Jegga AG (2009) Disease can-
didate gene identifi cation and prioritization using
protein interaction networks. BMC Bioinform 10:73
47. Lin CY, Chin CH, Wu HH et al (2008) Hubba:
hub objects analyzer–a framework of interactome
hubs identifi cation for network biology. Nucleic
Acids Res 36:W438–443
48. Junker BH, Koschützki D, Schreiber F (2006) Ex-
ploration of biological network centralities with
CentiBiN. BMC Bioinform 7:219
49. Junker BH, Schreiber F (2007) Network centrali-
ties. In: Analysis of biological networks. John Wi-
ley & Sons, Hoboken, NJ, USA
50. Milo R, Shen-Orr S, Itzkovitz S et al (2002) Net-
work motifs: simple building blocks of complex
networks. Science 298:824–827
51. Moon HS, Bhak J, Lee KH, Lee D (2005) Ar-
chitecture of basic building blocks in protein and
domain structural interaction networks. Bioinfor-
matics 21:1479–1486
52. Assenov Y, Ramírez F, Schelhorn SE et al (2008)
Computing topological parameters of biological
networks. Bioinformatics 24:282–284
53. Schreiber F, Schwöbbermeyer H (2005) MAVisto:
a tool for the exploration of network motifs. Bio-
informatics 21:3572–3574
54. Wernicke S, Rasche F (2006) FANMOD: a tool
for fast network motif detection. Bioinformatics
22:1152–1153
55. Barabási AL, Oltvai ZN (2004) Network biology:
understanding the cell’s functional organization.
Nat Rev Genet 5:101–113. Review
56. Balasundaram B, Butengo S (2007) Network
clustering. In: Analysis of biological networks.
John Wiley & Sons, Hoboken, NJ, USA
57. Luo F, Yang Y, Chen C-F et al (2007) Modular
organization of protein interaction networks. Bio-
informatics 23:207–214
58. Hartwell LH, Hopfi eld JJ, Leibler S, Murray AW
(1999) From molecular to modular cell biology.
Nature 402[6761 Suppl]:C47–52
59. Reimand J, Tooming L, Peterson H et al (2008)
GraphWeb: mining heterogeneous biological net-
works for gene modules with functional signifi -
cance. Nucleic Acids Res 36:W452–459
60. Vlasblom J, Wu S, Pu S et al (2006) GenePro: a
Cytoscape plug-in for advanced visualization and
analysis of interaction networks. Bioinformatics
22:2178–2179
61. Bader GD, Hogue CW (2003) An automated
method for fi nding molecular complexes in large
protein interaction networks. BMC Bioinform 4:2
62. Maere S, Heymans K, Kuiper M (2005) BiNGO:
a Cytoscape plugin to assess overrepresentation of
gene ontology categories in biological networks.
Bioinformatics 21:3448–3449
63. Brohée S, Faust K, Lima-Mendez G et al (2008)
NeAT: a toolbox for the analysis of biological
networks, clusters, classes and pathways. Nucleic
Acids Res 36:W444–451
64. Rivera CG, Vakil R, Bader JS (2010) NeMo: Net-
work Module identifi cation in Cytoscape. BMC
Bioinform 11[Suppl 1]:S61
65. Ma’ayan A (2008) Network integration and graph
analysis in mammalian molecular systems biol-
ogy. IET Syst Biol 2:206–221. Review
66. Liu ET (2005) Systems biology, integrative biolo-
gy, predictive biology. Cell 121:505–506. Review
67. McDermott JE, Costa M, Janszen D et al (2010)
Separating the drivers from the driven: integrative
network and pathway approaches aid identifi ca-
tion of disease biomarkers from high-throughput
data. Dis Markers 28:253–266. Review
68. Mathew JP, Taylor BS, Bader GD et al (2007)
From bytes to bedside: data integration and com-
putational biology for translational cancer re-
search. PLoS Comput Biol 3:e12
69. Camargo A, Azuaje F (2007) Linking gene ex-
pression and functional network data in human
heart failure. PLoS One 2:e1347
70. Barsky A, Gardy JL, Hancock RE, Munzner T
(2007) Cerebral: a Cytoscape plugin for layout
of and interaction with biological networks using
subcellular localization annotation. Bioinformat-
ics 23:1040–1042
71. Paquette J, Tokuyasu T (2010) EGAN: explor-
atory gene association networks. Bioinformatics
26:285–286
72. Junker BH, Klukas C, Schreiber F (2006) VANT-
ED: a system for advanced data analysis and vi-
sualization in the context of biological networks.
BMC Bioinform 7:109
73. Markowitz SD, Bertagnolli MM (2009) Molecu-
lar origins of cancer: molecular basis of colorectal
cancer. N Engl J Med 361:2449–2460. Review
74. Ohta M, Seto M, Ijichi H et al (2009) Decreased
expression of the RAS-GTPase activating protein
RASAL1 is associated with colorectal tumor pro-
gression. Gastroenterology 136:206–216
75. Moon RT (2005) Wnt/beta-catenin pathway. Sci
STKE 2005:cm1. Review
76. Bertucci F, Salas S, Eysteries S et al (2004) Gene
expression profiling of colon cancer by DNA
microarrays and correlation with histoclinical
parameters. Oncogene 23:1377–1391
77. Minguez P, Dopazo J (2011) Assessing the bio-
logical signifi cance of gene expression signatures
and co-expression modules by studying their net-
work properties. PLoS One 6:e17474
78. Henderson BR (2000) Nuclear-cytoplasmic shut-
tling of APC regulates beta-catenin subcellular lo-
calization and turnover. Nat Cell Biol 2:653–660
79. Chua HN, Wong L (2008) Increasing the reliabil-
ity of protein interactomes. Drug Discov Today
13:652–658
80. Strogatz SH (2001) Exploring complex networks.
Nature 410:268–276. Review
81. Hanahan D, Weinberg RA (2011) Hallmarks of
cancer: the next generation. Cell 144:646–674
82. Kenny PA, Lee GY, Bissell MJ (2007) Target-
ing the tumor microenvironment. Front Biosci
12:3468–3474