ArticlePDF AvailableLiterature Review

Tools for protein-protein interaction network analysis in cancer research

January 2012
Clinical and Translational Oncology 14(1):3-14

January 2012
14(1):3-14

DOI:10.1007/s12094-012-0755-9

Source
PubMed

Authors:

Rebeca Sanz-Pamplona

IDIBELL Bellvitge Biomedical Research Institute

Antoni Berenguer Llergo

Parc Taulí Research and Innovation Institute (I3PT)

Xavier Solé

Catalan Institute of Oncology

David Cordero

Catalan Institute of Oncology

Show all 9 authorsHide

As cancer is a complex disease, the representation of a malignant cell as a protein-protein interaction network (PPIN) and its subsequent analysis can provide insight into the behaviour of cancer cells and lead to the discovery of new biomarkers. The aim of this review is to help life-science researchers without previous computer programming skills to extract meaningful biological information from such networks, taking advantage of easy-to-use, public bioinformatics tools. It is structured in four parts: the first section describes the pipeline of consecutive steps from network construction to biological hypothesis generation. The second part provides a repository of public, user-friendly tools for network construction, visualisation and analysis. Two different and complementary approaches of network analysis are presented: the topological approach studies the network as a whole by means of structural graph theory, whereas the global approach divides the PPIN into sub-graphs, or modules. In section three, some concepts and tools regarding heterogeneous molecular data integration through a PPIN are described. Finally, the fourth part is an example of how to extract meaningful biological information from a colorectal cancer PPIN using some of the described tools.

Construction tools

…

Topological analysis tools

…

Modular analysis tools

…

ntegration tools

…

No caption available

…

Figures - uploaded by Rebeca Sanz-Pamplona

Content may be subject to copyright.

Content uploaded by Rebeca Sanz-Pamplona

Content may be subject to copyright.

Abstract As cancer is a complex disease, the representa-

tion of a malignant cell as a protein-protein interaction

network (PPIN) and its subsequent analysis can provide

insight into the behaviour of cancer cells and lead to the

discovery of new biomarkers. The aim of this review is to

help life-science researchers without previous computer

programming skills to extract meaningful biological in-

formation from such networks, taking advantage of easy-

to-use, public bioinformatics tools. It is structured in four

parts: the ﬁ rst section describes the pipeline of consecutive

steps from network construction to biological hypothesis

generation. The second part provides a repository of public,

user-friendly tools for network construction, visualisation

and analysis. Two different and complementary approaches

of network analysis are presented: the topological approach

studies the network as a whole by means of structural graph

theory, whereas the global approach divides the PPIN into

sub-graphs, or modules. In section three, some concepts

and tools regarding heterogeneous molecular data integra-

tion through a PPIN are described. Finally, the fourth part

is an example of how to extract meaningful biological in-

formation from a colorectal cancer PPIN using some of the

described tools.

Keywords Cancer · Systems biology · Protein-protein

interaction network · Public bioinformatics tools ·

Biomarker discovery

Introduction

Cancer is a complex disease in which many proteins, genes

and molecular processes are implicated [1]. Genes and

proteins do not work independently, but are organised into

co-regulated units that perform a common biological func-

tion. It is the alteration of these functional elements that

leads to the development of a particular cancer phenotype

(i.e., drug response or disease outcome) and, consequently,

their study cannot be tackled from the classical one-gene

approach. A systems biology approach, the analysis of the

molecular relationship between the implicated genes and

proteins as a whole, is required to understand the disease

phenotype [2–4].

In this scenario, cancer systems medicine emerges as a

translational extension of systems biology that meets the

clinical information and the -omics disciplines for the clas-

siﬁ cation and diagnosis of cancer subtypes, the prognosis

of patient outcomes, the prediction of treatment responses

and the identiﬁ cation of perturbation targets for drug devel-

opment [5, 6].

Proteins interact with each other within a cell, and those

interactions can be represented by a network, deﬁ ned as an

abstract representation of nodes or vertices (i.e., proteins)

R. Sanz-Pamplona · A. Berenguer · X. Sole · D. Cordero ·

M. Crous-Bou · J. Serra-Musach · E. Guinó · M.A. Pujana ·

V. Moreno (쾷)

Unit of Biomarkers and Susceptibility

Catalan Institute of Oncology (ICO)

Bellvitge Institute for Biomedical Research (IDIBELL)

Biomedical Research Centre Network for Epidemiology

and Public Health (CIBERESP)

Av. Gran Vía, 199

ES-08908 L’Hospitalet de Llobregat, Barcelona, Spain

e-mail: v.moreno@iconcologia.net

V. Moreno

Department of Clinical Sciences

Faculty of Medicine

University of Barcelona

Barcelona, Spain

Clin Transl Oncol (2012) 14:3-14

DOI 10.1007/s12094-012-0755-9

EDUCATIONAL SERIES Blue Series

Tools for protein-protein interaction network analysis in cancer research

Rebeca Sanz-Pamplona · Antoni Berenguer · Xavier Sole · David Cordero · Marta Crous-Bou · Jordi Serra-Musach ·

Elisabet Guinó · Miguel Ángel Pujana · Víctor Moreno

Received: 14 July 2011 / Accepted: 20 August 2011

ADVANCES IN TRANSLATIONAL ONCOLOGY

Clin Transl Oncol (2012) 14:3-14

where some pairs of nodes are connected by edges repre-

senting interactions [7]. With the recent advances in high-

throughput experimental technologies, increasing numbers

of large-scale biological networks are being deﬁ ned [8,

9]. Network knowledge can give rise to understanding

the biological function and dynamic behaviour of cellular

systems, generating biological hypothesis about putative

biomarkers, therapeutic targets or deregulated pathways in

cancer [10–14].

Cancer-related proteins have a higher ratio of promis-

cuous structural domains, making them more prone to

interact with other proteins. In fact, they have a large num-

ber of interacting proteins and occupy a central position in

the networks [15]. Proteins interacting with cancer-related

proteins have a higher probability of being related with the

cancer process than non-interacting proteins. Hence, the

study of those proteins may be an efﬁ cient way to discover

novel cancer genes and cancer biomarkers [16–18].

Since understanding complex networks representing a

cancer cell is one of the main challenges of today’s biol-

ogy, this review attempts to help life-science researchers

without previous computer programming skills to extract

meaningful biological information from such networks,

taking advantage of easy-to-use, public bioinformatics

tools. Though different types of biological networks exists

such us regulatory networks, signal transduction networks

or metabolic networks, here only protein–protein interac-

tion networks (PPINs) will be covered. In addition, due

to the complexity of directed (networks whose edges have

directional information) and dynamic networks (those

including changes along time), only undirected and static

PPINs will be discussed here [19].

This review is structured into four sections. The ﬁ rst

section brieﬂ y describes the workﬂ ow enumerating con-

secutive steps from network construction to hypothesis

generation. The second section details suitable tools to

carry out each step of the analysis. Some concepts about

data integration are described in the third section. Finally,

a fourth section presents an illustrative example of how to

use these tools, using colorectal cancer data. This is not a

compendium of all existing network-management tools but

a tutorial to construct and analyze protein interaction net-

works in a simple manner. It should be noted, however, that

a unique method does not exist and each particular network

may have characteristics requiring speciﬁ c software. For

example, software suitable for dealing with huge graphs

may not be helpful for analysing small networks, and vice

versa. Also, although graph theory is beyond the aim of

this review, basic ideas to start dealing with interaction net-

works will be provided and the references would be helpful

for a more in-depth study of this topic.

Work-ﬂ ow: from network assembly to hypothesis

generation

Figure 1 summarises five sequential steps required to

generate a biological hypothesis on cancer cell behaviour

through PPIN construction and analysis.

The starting point is to decide which proteins deﬁ ne

the input, hereafter seed proteins (ﬁ rst step). These should

be the molecules of major interest and will be the skeleton

of the PPIN. Typical choices are differentially expressed

Fig. 1 Pipeline. The process of PPIN con-

struction and analysis follows these con-

secutive steps. First, a list of molecules of

interest (seed proteins) is defined. Next,

their interactions are searched in a specia-

lised database and represented in a PPIN.

Then, the network is analysed and conse-

quently a biological hypothesis is generated.

In this diagram (steps 1–3), six seed pro-

teins (1–6) are represented in red whereas

their interacting proteins are represented in

green (“a” and “b”). Protein “a” interacts

with seeds “1”, “2” and “6”; protein “b”

interacts with seeds “3” and “4”; and seeds

“1” and “6” interact with each other. Seed

“5” has no interacting partners. Step 4 of the

chart shows two complementary network

analysis approaches: ﬁ rst, the topological

methods, which look for essential nodes

into the architecture of the network. In this

example interacting protein “a” acts as a

hub because of their higher degree. Second,

modular methods divide network into sub-

graphs grouping proteins sharing a common

property. Some of the public tools useful in

each step of the analysis are also represented

in the ﬁ gure

Clin Transl Oncol (2012) 14:3-14

molecules observed in a given experiment (transcriptomic

or proteomic) or molecules known to be involved in cancer.

The ﬁ nal hypothesis derived from the network analysis will

be directly related to these seed proteins.

The second step is the retrieval of binary interactions.

Interacting partners of seed proteins need to be identiﬁ ed

from curated databases. Several publicly available data-

bases exist: HPRD [20], String [21], DIP [22] and others

[23]. A description of the experimental and computational

procedures to obtain these protein-protein interaction data

is beyond the objectives of this review (see Refs. [24] and

[25] for more information).

The third step is network construction and visu-

alisation. From the set of protein–protein interactions, the

construction of a graph consists of assigning vertices to

proteins and edges to interactions between proteins. Then,

several algorithms allow the creation of a visual represen-

tation of the network [26].

The fourth step is network analysis, when meaningful

biological information extraction is done using bioinfor-

matics methods. Two complementary approaches in the

area of network analysis exist: topological (study of the

whole graph) and modular (division of networks into mod-

ules of related proteins) [27].

As a result, derived from network construction and

analysis, a hypothesis generation (step 5) regarding the

initial data is desirable. Ideally, a topological network anal-

ysis usually identiﬁ es proteins susceptible to be biomarkers

or therapeutic targets, whereas a modular approach gives

information about deregulated functions or pathways.

Public tools for network management

Multiple public network management tools exist. An ex-

haustive review published few years ago identiﬁ ed no less

than 35 and the number continues to grow exponentially

[28]. In this review only some of the best known and/or

easier to use will be discussed, but it is strongly recom-

mended to explore other tools, some of which might be

useful for speciﬁ c topics.

Table 1 Construction tools

Kind of interactions Included databases Input Distinctive features Webpage

Experimentally

determined and

predictions

Experimentally

determined and

predictions

Only experimentally

determined

Experimentally

determined and

predictions

String, Intact,

DIP, Degg, IPI,

SCOP, UniProt,

Reactome, MINT,

cog and psi_mi

DIP, MINT,

BIND, HPRD,

MIPS, CYGD

BioGRID and

NCBI

HPRD, IntAct,

BIND, DIP and

MINT

MDC_Y2H,

CCSB, HPRD,

DIP, BIND,

IntAct, BioGRID,

COCIT, REAC-

TOME, ORTHO,

HOMOMINT

and OPHID

BIANA [31]

Poinet [32]

SNOW [33]

UniHI [34]

Accepts a variety

of identiﬁ ers (Uni-

Prot, Ensembl,

GeneSymbol…)

NCBI or UniProt

identiﬁ ers

Gen, transcript or

protein

Entrez Gene,

GeneSymbol,

UniProt, NCBI,

Ensembl, RefSeq,

BioGrid, HPRD,

OMIM

• On-line interface

• Flexibility: It is possible to choose the

network level, the relation types, restrict

interactions by method and add interologs

• The output could be downloading and

visualised in Cytoscape

• By default, only experimentally

determined interactions were retrieved

• It is possible to ﬁ lter interactions based

on the number of shared GO terms

between the two interacting proteins

• PPI could be ﬁ ltered with tissue-speciﬁ c

expression data from public resources

• The output will be directly visualised

with POINET or be downloaded and

visualised in Cytoscape

• Construct minimal connected network

(MCN); a graph containing only seed

and linker proteins

• Maps seed proteins onto an interactome

of reference calculating the network

parameters degree, clustering coefﬁ cient

and betweenness

• Uses Human Gene Atlas data to construct

tissue-speciﬁ c interaction networks

• Annotate networks with pathway

information from KEGG database

• Only accept a maximum of 50 proteins

as an input

http://sbi.imim.es/web/BIANA.

php

http://poinet.bioinformatics.tw

http://snow.bioinfo.cipf.es/cgi-

bin/snow.cgi

http://theoderich.fb3.mdc-

berlin.de:8080/unihi/home.jsp

Clin Transl Oncol (2012) 14:3-14

Network management software typically specialises in

construction, visualisation or analysis steps. However, this

is an artiﬁ cial classiﬁ cation and overlapping is common:

some tools are useful for several or all steps.

Construction tools

Once the list of seed proteins is ready, the ﬁ rst step con-

sists in joining them together through linkers, proteins that

bind with two or more seed proteins working as bridges.

The number of linker proteins inserted between two seed

proteins determines the network distance or network level

[29]. A distance one is recommended since distance two

usually retrieves an undesirable “ball of yarn” network.

Moreover, at this point of the analysis it is crucial to de-

cide the nature of the interactions that will be included in

the analysis: experimentally and/or computationally de-

termined. Literature-based interactions are more reliable

but biased towards networks of the better studied proteins

and less likely to discover new interesting interactions.

Computational-inferred interactions from high-throughput

experiments do not have this bias, but result in a higher

rate of false interactions being included, so a more careful

interpretation of the data and a subsequent experimental

validation are desirable [30].

Tools exist that look for binary interactions in special-

ised databases and automatically retrieve a PPIN. Some of

the more popular ones are summarised in Table 1.

Visualisation tools

Given a complex system under study, one natural goal is

to create a graphical representation of the system as a net-

work in which nodes represent proteins and edges interac-

tions between proteins [7]. Creating this representation is

not trivial work and sometimes it drives the interpretation

of the system and the hypothesis derived. Diverse layouts

exist to represent a network such as circular, hierarchical or

force-directed [28]. These can be drawn with network visu-

alisation tools (see Ref. [35] for a review). Three of them

have been summarised in Table 2.

Analytical tools

In order to extract underlying biological information from

the PPIN, it is necessary to analyse it using graph-theo-

retic tools. Two different and complementary approaches,

named topological and modular, have been developed for

the study of a complex network. The topological approach

studies the network as a whole by means of the analysis of

the structural parameters of the graph. Instead, the modular

approach divides the PPIN into modules that group nodes

based on a common characteristic such as sharing the same

function or belonging to the same pathway. Afterwards,

each module is studied separately [27].

Topological approaches: centrality measures and network

motifs

The description of the structural characteristics of a net-

work is often the first step in the analysis of network

data [40]. Biological networks including PPIN are usually

scale-free, meaning that a few nodes are highly connected

(“hubs”) and a majority of nodes are linked to only one or

a few neighbours [41, 42]. According to the lethality and

centrality rule, nodes that have a major number of connec-

tions are those that play a more important role in the archi-

tecture of the PPIN and tend to be biologically relevant in

the studied system [43]. In other words, highly connected

proteins are essential to organism viability [44]. It has also

been demonstrated that genes traditionally associated with

cancer are implicated in multiple cellular processes and

Table 2 Visualisation tools

Usage Input Distinctive features Webpage

Ease-to-download and

install Java applica-

tion (Windows, Mac or

UNIX)

The software can be

downloaded or directly

run from the web page

Java application

Table of interactions

(.xls or .txt) Multiple ﬁ le

types (.xml, .rdf, .owl,

.gml, .xgmml, .sif, .sbml)

List of interactions in .txt

format

List of interactions

retrieved from STRING

database

Cytoscape [36, 37]

Arena3D [38]

MEDUSA [39]

• The most popular visualisation tool

• Allows a variety of graph customisation

• Useful to integrate biomolecular networks

into a uniﬁ ed framework

• Cytoscape functionality can be expanded

using the collection of plugins developed

by Cytoscape’s community of users

• 3D view of the network

• Is recommendable to use a graphic card

with hardware-accelerated 3D graphics and

at least 256 MB of graphical memory

• It was specially designed and optimised for

accessing protein interaction data

from STRING database

http://www.cytoscape.org

http://www.cytoscape.org/

plugins2.php

http://arena3d.org

http://coot.embl.de/medusa

Clin Transl Oncol (2012) 14:3-14

signalling pathways, so they often work as protein hubs

inside an interaction network [45].

Identifying essential hubs in the PPIN is a way to

decipher the critical players inside the complex network.

Network centrality measures can be used to rank the nodes

of a given network and find the most important nodes,

hypothetically useful as biomarkers or therapeutic targets

[46]. The identification of central elements in biologi-

cal networks may also provide new hypotheses that lead

to more rational approaches in experimental design [47].

Several centrality measures exist that should be considered

within an exploratory process. The most important ones are

degree, betweenness, closeness and eigenvector centrality.

See Refs. [48] and [49] for a more in-depth explanation of

these concepts.

Network motif distribution is another useful measure. A

motif is a basic building block of complex graphs deﬁ ned

as a sub-network or connectivity pattern that appears in

Table 3 Topological analysis tools

Computed parameters Input Distinctive features Webpage

Degree, bottleneck, edge

percolated component,

subgraph centrality,

maximum neighbourhood

component and density of

maximum neighbourhood

component

Degree, eccentric-

ity, closeness, radiality,

centroid value, stress,

S.P. Betweenness,

C.-F. Closeness, C.-F.

Betweenness, Katz

Status, Eigenvector, Hub-

bell index, Bargaining,

PageRank, HITS-Hubs,

HITS-Authorities and

Closeness-vitality

Number of nodes and

edges, self-loops,

connected components,

average number of neigh-

bours, network diameter,

radius, density, cen-

tralisation, heterogeneity,

clustering coefﬁ cient,

number of shortest paths

and the characteristic

path length

Motifs

List of interactions in .txt

format

Network data in .net,

.tab, .mat or .xml format

Network charged in

Cytoscape environment

List of interactions in .txt

format

Network

Hubba [47]

Centibin [48]

NetworkAnalyzer

[52]

MAVisto [53]

FANMOD [54]

• Web-based tool

• The appropriate tool to just rank proteins

in a network by centrality measures

• Free installable Windows application

• Useful for a detailed centrality study be-

cause offers more algorithms than

the other tools

• Java plugin for Cytoscape

• Displays a comprehensive set of

topological parameters

• It is possible to visualise different param-

eters in the same network by changing

node’s features (i.e., “degree” in colour and

“closeness centrality” in size)

• Motifs were detected by comparing the

frequency of all occurrences of a motif in

the studied network to the frequency values

of this motif in randomisations of the same

network

• MAVisto presents several presentations of

their results: a motif table (with p-value and

z-score), a motif view, a motif ﬁ ngerprint

and a visualisation of motif matches in the

network

• Computationally time consuming

• Motifs were detected and grouped intomotif

classes. Then, an algorithm determines

which motif classes are displayed at much

higher frequency than in random graphs

• Faster than MAVisto

http://hub.iis.sinica.edu.tw/

Hubba

http://centibin.ipk-gatersleben.

de/

http://med.bioinf.mpi-inf.mpg.

de/networkanalyzer/

http://mavisto.ipk-gatersleben.de

http://www.minet.unijena.

de/~wernicke/motifs

Clin Transl Oncol (2012) 14:3-14

a PPIN at a signiﬁ cantly higher frequency than would be

expected for a random network [50]. The distribution of

motifs characterises the local structure of networks and has

also been shown to be functionally relevant [51]. Despite

the high complexity involved in the detection of network

motifs, in practice the search can be executed in reasonable

time using available software. Typical motifs that repeat-

edly appear in regulatory networks are autoregulatory or

feed-forward motifs. Tools to calculate topological network

parameters are presented in Table 3.

Modular approach

Based on the idea that biological systems are composed of

modules containing interacting components [55], a way to

achieve a better understanding of a complex network is to

break it down into simpler units called modules. A module

is often understood as a subset of vertices that are densely

connected among one another [56].

Commonly, in addition to closeness between nodes,

functional criteria are used to divide a network into mod-

ules. Similar proteins tend to be connected in molecular

networks, so distinct sets of proteins and their correspond-

ing interactions constitute different blocks underlying

common functions [57]. Therefore, the study of modules

could be equivalent to the study of functional units of the

malignant cell [58]. In Table 4, some modular-based tools

helpful to manage a complex PPIN are presented.

Data integration

Taking into account that cancer is a multi-factorial disease

involving diverse anomalies, the analysis of biological

networks integrating different types of molecular data can

lead to discovery of robust, speciﬁ c and useful biomarkers

Table 4 Modular analysis tools

Computed parameters Input Distinctive features Webpage

Connected components,

neighbourhood modules,

hub-based modules,

cliques and cluster

modules

Turn a network into an

interacting clusters

Clusters

GO terms overrepre-

sentation in biological

networks

Clusters

Modules

List of interactions

Network charged in

Cytoscape’s environment

Network charged in

Cytoscape’s environment

Network charged in

Cytoscape’s environment

Tab-delimited, GML,

VisML, DOT and

adjancency matrix

format

Network charged in

Cytoscape’s environment

GraphWeb [59]

GenePRO [60]

MCODE [61]

BiNGO [62]

NEAT [63]

NEMO [64]

• Performs a functional proﬁ ling of discov-

ered modules based on GO annotations

• Ref. [58] provides an accurate description

of algorithms underlying each clustering

method

• Break down a network into functional

modules extracting them as independent

sub-networks

• Cytoscape plugin

• Displays a view of the clusters as individual

but interconnected nodes, maintaining the

whole-network picture

• A previous hand-made deﬁ nition of clusters

is necessary

• Cytoscape plugin

• Detects densely connected regions in a

network

• Speciﬁ cally oriented to the discovery

of molecular complexes

• A set of nodes must be manually selected

from a network and BiNGO retrieves GO

terms associated to this set of proteins

• Test the statistical signiﬁ cance of the

enrichment and control the false

discovery rate

• Divides the network into non-overlapping

clusters

• Retrieve KEGG or MetaCyc pathways

in which proteins are implicated

• Identify network communities based on

the premise that densely connected nodes

correspond to functional modules

http://biit.cs.ut.ee/graphweb/

http://wodaklab.org/genepro/

http://baderlab.org/Software/

MCODE

http://www.psb.ugent.be/cbd/

papers/BiNGO/Home.html

http://rsat.ulb.ac.be/rsat/in-

dex_neat.html

http://baderlab.bme.jhu.edu/

baderlab/index.php/NeMo

Clin Transl Oncol (2012) 14:3-14

of disease; and also shed light on the mechanisms and aeti-

ology of the studied tumour [65–68].

The representation of data derived from heterogeneous

sources in a unique network is a way to integrate diverse

and massive data sets. PPINs can integrate diverse mo-

lecular data to get a more complete model of the biological

system (Fig. 2). It has been postulated that proteins with

high connectivity within a network could be very impor-

Fig. 2 Data integration into a network to

obtain a more informative PPIN. Red and

green circles represent seed and linker pro-

teins respectively. Complementary mo-

lecular information: over-expression at the

mRNA level is indicated as a purple circle

and proteins with mutations at DNA are

represented as a half-moon shape, i.e., pro-

teins “a” and “d” are overexpressed and

connected by a protein not deregulated at

mRNA level, but mutated. Some of the pub-

lic tools useful to data integration appear in

the purple box

Table 5 Integration tools

Kind of integrated data Input Distinctive features Webpage

Subcellular location

Expression values

-omics experiments

results: expression

microarrays, aCGH, MS/

MS proteomics, GWAS

data, ChIP-chip experi-

ments, DNA methylation

assays or high-throughput

sequencing

Experimental data

Network charged in

Cytoscape’s environment

and subcellular location

data

Network charged in

Cytoscape’s environment

and expression data

A network and high-

throughput results

A network and a

biochemical dataset

Cerebral [70]

Dynamic Expres-

sion Plugin [37]

EGAN [71]

Vanted [72]

• Cytoscape’s plugin

• It generates an intuitive view of the network

in which proteins appear separated into

layers according to the context of cell

organelles

• Cerebral does not automatically search for

cellular location: this data must be provided

to Cerebral as a Cytoscape attribute

• Cytoscape’s plugin

• It colours the nodes in a range accord-

ing to their level of expression: from blue

(minimum expression) to red (maximum

expression)

• Useful to easily identify down- or up-

regulated areas of the network

• An expression data ﬁ le must be charged in

Cytoscape

• Java application

• It allows combining interaction and molec-

ular data in the context of network modules,

i.e., expression data: divide network into

topological modules (motifs) and then look

for co-expression patterns in each module,

divide network into functional modules

and then look for co-expression patterns,

or use expression information to divide the

network into co-expression modules

• EGAN allows selecting nodes based on

crossing between different data: i.e., select

all genes with up-regulated expression and

ampliﬁ ed copy number.

• Easy to download and install Java applica-

tion (Windows, Mac or UNIX)

• A tool specially designed to help scientists

with the interpretation of related experi-

mental data

http://www.pathogenomics.ca/

cerebral/

http://chianti.ucsd.edu/svn/

csplugins/trunk/ucsf/scooter/

dynamicXpr/

http://akt.ucsf.edu/EGAN/

http://vanted.ipk-gatersleben.de/

Clin Transl Oncol (2012) 14:3-14

tant to the studied disease, despite not being differentially

expressed. Thus, genes with a role in tumorigenesis not

detected in a high-throughput experiment could be iden-

tiﬁ ed by a network-based approach. For example, if an

important protein is activated by phosphorylation, its gene

expression may not be altered, but the kinase that phospho-

rylates it will be up-regulated. So, even though no changes

in expression are observed when measuring the protein,

since that protein is connected to its kinase that is altered,

the network will reveal its importance. The same occurs

with mutated genes with a role in tumour progression not

detected by differential expression experiments, but usually

taking up a central position in networks [69].

Usually, a network contains false positive interactions

or interactions that are not working in the studied tissue.

Expression data could be used as a ﬁ lter assuming that if

a gene is not expressed in such tissue, neither will its cor-

responding protein. Consequently, interactions containing

non-expressed genes are not real interactions. Several tools

for diverse data integration into a network are presented in

Table 5.

An example using genes classically related to colorectal

cancer

Figure 3 shows an example of how to use some of the

previously described tools to extract biological informa-

tion from the following 15 colorectal cancer (CRC) genes:

APC, BUB1, MAD2L1, TP53, PI3KCA, EGFR, AURKA,

CTNNB1, SMAD4, WNT1, AXIN2, TGFBR2, MLH1, BRAF

and KRAS. These seed proteins, classical key molecules

driving colon carcinogenesis, are a mix of chromosomal in-

stability (CIN) genes, microsatellite instability (MSI) genes

and CpG island methylation phenotype (CIMP) genes [73].

BIANA software was used to retrieve and export a ﬁ le

containing experimentally determined interactions of the

seed proteins. Next, a visual representation of the resulting

network was performed using Cytoscape software (Fig.

3A). The PPIN showed two components, one called the

giant component, because it contained the higher number

of nodes, and a smaller independent network. The giant

component grouped all seed proteins except WNT and its

interacting partners. APC appeared central, directly inter-

acting with seed proteins AURKA, MAD2L1, CTNNB1,

BUB1 and AXIN2, and indirectly, through linker proteins,

with the remaining seeds except MLH1 (MSI representative

gen). KRAS and BRAF directly interacted with each other

since both are chosen as CIMP-related genes. The protocol

in Fig. 3B was followed to analyse this PPIN including a

topological approach, a clustering or modular approach

and a data integration step.

First, a topological exploration of the PPIN was made:

centrality measures of hub proteins were calculated us-

ing Hubba and NetworkAnalyzer software. Protein ranks

differed slightly depending on the algorithm used for the

analysis, but in all cases AURKA, EGFR and TP53 ap-

peared as the most central proteins in the network, indicat-

ing their biological relevance in the pathogenesis of CRC.

Interestingly, BRAF took up the second position when cen-

trality was measured in terms of maximum neighbourhood

component (MNC) but descended to the fourth position in

degree and sixth in betweenness. This means that though

BRAF does not have many interacting partners and is not

located in all paths crossing the PPIN, when the network

is divided into clusters of densely connected elements, its

appears in more clusters than other proteins such us EGFR

or TP53 (Fig. 3C). A network motif analysis was also done

with MAVisto software, revealing some repeated structures

of the network. Due to the computational requirements of

this complex task, this analysis was done on a small ver-

sion of the network (extracted with POINET software in-

stead of BIANA). As an example, this application revealed

as an important association the interaction between TP53

and the less studied protein RASA1 through the two link-

ers AURKA and CDKN2A (Fig. 3D). A search in PubMed

revealed that decreased expression of RASA1 is associated

with abnormal expression of TP53 in advanced colorectal

tumours [74]. However, motif results must be carefully

interpreted. This analysis is more suitable for directed net-

works (usually regulatory networks), in which directional-

ity of the interactions are represented.

Second, a clustering analysis was performed to look

for both functional modules and molecular complexes

with biological meaning. BINGO software highlighted that

“DNA-repair” (p=4.110–8) and “response to DNA dam-

age” (p=1.110–7) were the most representative GO terms

in the cluster grouping MLH1-interacting proteins. Also

“transmembrane receptor protein serine/threonine kinase

signalling pathway” (p=8.410–13) and “small GTPase

mediated signal transduction” (p=2.010–11) were the most

representative functions of Smad4-interacting proteins

(Fig. 3E). A betweenness centrality clustering analysis with

GraphWeb software effectively separated CIN and MSI

genes, and was also useful to discover biological pathways

inside the network: MLH1 and its interacting proteins

formed a module with statistically signiﬁ cant enrichment

in the KEGG pathway “mismatch repair” (concordant with

BINGO results). BUB1, CDK1 and TGFBR2 deﬁ ned a

module of interacting proteins enriched in “transforming

growth factor receptor signalling pathway”. The GO term

“Wnt receptor signalling pathway” included APC, CT-

NNB1 and AXIN2 (Fig 3F). So, although WNT1 intrigu-

ingly did not appear to interact with these proteins, this ap-

proach was able to capture the classical Wnt/beta-catenin

pathway in CIN CRC [75]. As an alternative approach,

MCODE was used to search for putative molecular com-

plexes. Four complexes were retrieved: the ﬁ rst included

AURKA, MAD2L1 and its interacting proteins. The sec-

ond contained BRAF, EGFR and its linker proteins RIN1,

PKP2, RAPGEF1 and CRK. TP53, BUB1, HDAC5 and

PRKCA formed another complex. Lastly, a four-node com-

plex included the direct interaction between seed proteins

Clin Transl Oncol (2012) 14:3-14

TGFBR2, SMAD4, and its linker proteins SMAD3 and

SMAD7 (Fig. 3G).

Finally, data integration was performed. Easily, using

Cytoscape software, nodes from the PPIN were merged

Fig. 3 Example of PPIN construction and analysis. A Visual representation (force directed layout) of the network using Cytoscape software. BI-

ANA software was chosen to construct a PPIN with only experimentally determined interactions, which resulted in 1466 nodes and 2176 edges.

The bottom right insert shows a reduction to MNC of the same PPIN. B Protocol followed to analyse the network: topological exploration,

clustering and data integration. C Centrality measures of the PPIN using Hubba (Degree and MNC) and NetworkAnalyzer (betweenness). Both

applications output a ranking of the proteins but differ in the graphical representation. Hubba uses a colour code to highlight the most centred

proteins in the PPIN (from red to blue). In NetworkAnalyzer the larger nodes represent the most centralised proteins. D Output of MAVisto soft-

ware. On the right, the description of all discovered motifs. On the left, black and white PPIN with network motifs represented in colour.

Clin Transl Oncol (2012) 14:3-14

with a list of 202 differentially expressed genes between

cancerous and noncancerous colon tissues, extracted from

Bertucci et al. [76]. As a result, 37 proteins were found to

be deregulated at mRNA level (Fig. 3H). These included

some previously identiﬁ ed as important hubs such us TP53

(over-expressed), reinforcing their critical role in colorectal

tumorigenesis. Among interacting proteins, this approach

allowed us to focus our attention on parts of the network

containing deregulated proteins such us TGFB3 (under-

expressed) or CDK2 (up-regulated) [77]. Moreover, a more

detailed analysis revealed that though crucial CRC proteins

such as APC did not appear differentially expressed (prob-

ably because it is a mutated but not differentially expressed

gene), some of its interacting proteins like PTK2 or SFN

were up-regulated. Specially, YWHAZ emerged as an

important protein at the crossroads of BRAF, EGFR and

TP53.

Cerebral software was used to merge the network with

subcellular location information. A reduced PPIN that only

included seeds and linkers was used to obtain a clearer

picture. Proteins were placed into layers of predeﬁ ned lo-

cations: extracellular region, plasma membrane, cytoplasm,

peroxisome, proteasome complex, mitochondrion, Golgi,

endoplasmic reticulum and nucleus. MLH1, located in the

nucleus, mainly interacted with proteins in the nucleus,

with the exception of TRIM29 and AP2B1, which are cy-

Fig. 3 (continuation) E Functional analysis using BINGO plugin for Cytoscape. In orange, nodes selected to analyze (in this example MLH1 and

SMAD4 connected proteins). Companion table shows the output including for each GO their ID, a description, a p-value, a corrected p-value

(Benjamini and Hochberg multiple testing correction), the cluster frequency, the total frequency and the genes included in that GO process. F

Division of the network into functional modules (GO terms and KEGG pathways) using GraphWeb. G Output of MCODE tool that looks for

molecular complexes. H Gene expression integration using Cytoscape. Over-expression represented in red and under-expression in green. Light

purple nodes show non-differential expression. On the right, the list of deregulated proteins. I Subcellular location classiﬁ cation of the MNC

PPIN using Cerebral plugin. MLH1 and APC are highlighted and individually represented

Clin Transl Oncol (2012) 14:3-14

toplasmatic proteins. APC interacted with both nuclear and

cytoplasmatic proteins. This is probably due to the reported

nuclear-cytoplasmic shuttling of APC [78] (Fig. 3I).

Note of caution

An increasing number of specialised tools are appearing in

each area of network construction and analysis. We strong-

ly encourage researchers to search and explore other tools

beyond those described here.

It is also important not to forget that a network is just a

representation of the studied system, but not the real world.

Although valuable for hypothesis generation, biological

validation of the hypothesis derived from network analysis

is desirable. It is necessary to keep in mind that despite

huge efforts made in this area, the human interactome is not

completed. Well studied proteins have a higher probability

of being included in such a network, resulting in some se-

lection bias with respect to less studied proteins. Moreover,

it is well known that the human interactome contains false

positive interactions, so a careful interpretation of results

is required [79]. Lack of spatial-temporal information is

another obstacle to consider in the network elucidation

process: we assume that two proteins are always interacting

when actually they only work together in a certain tissue or

even organelle, or in a certain cell cycle time [80].

Otherwise, a network-centric approach remains incom-

plete because of the intrinsic complexity of cancer disease:

complex cross-talk among cancer cells [81] and with the

surrounding microenvironment [82] is not painted in PPIN,

which only represent interactions inside a single cancer

cell.

Conclusions

In the ﬁ eld of cancer research, the combination of classical

techniques with systems biology and network tools can be

useful to generate more accurate biological hypotheses re-

garding therapy, prognosis or tumour classiﬁ cation, bring-

ing us closer to personalised medicine.

However, despite the invaluable help of these tech-

niques, no software yet exists comparable to human brain.

A medical and biological point of view is needed for the

interpretation of complex networks.

Conﬂ ict of interest The authors declare that they have no conﬂ ict of

interest relating to the publication of this manuscript.

Acknowledgements This study was supported by the Catalan Insti-

tute of Oncology, the Private Foundation of the Biomedical Research

Institute of Bellvitge (IDIBELL), the Instituto de Salud Carlos III

(grants FIS PI08/1635, FIS PI08/1359, FIS 06/0545 and FIS 05/1006,

PI081359, PI08-1635, PI09-01037), CIBERESP CB07/02/2005, the

Spanish Association Against Cancer (AECC) Scientiﬁ c Foundation,

the Catalan Government DURSI grant 2009SGR1489, and the Euro-

pean Commission grants FOOD-CT-2006-036224-HIWATE and FP7-

COOP-Health-2007-B HiPerDART.

References

1. Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lan-

kelma J (2006) Cancer: a Systems Biology dis-

ease. Biosystems 83:81–90

2. Kitano H (2002) Systems biology: a brief over-

view. Science 295:1662–1664

3. Kreeger PK, Lauffenburger DA (2010) Cancer

systems biology: a network modeling perspective.

Carcinogenesis 31:2–8

4. Wang E, Lenferink A, O’Connor-McCourt M

(2007) Cancer systems biology: exploring cancer-

associated genes on cellular networks. Cell Mol

Life Sci 64:1752–1762

5. Auffray C, Chen Z, Hood L (2009) Systems medi-

cine: the future of medical genomics and health-

care. Genome Med 1:2

6. Clermont G, Auffray C, Moreau Y et al (2009)

Bridging the gap between systems biology and

medicine. Genome Med 1:88

7. Alberghina L, Höfer T, Vanoni M (2009) Mo-

lecular networks and system-level properties. J

Biotechnol 144:224–233

8. Stelzl U, Worm U, Lalowski M et al (2005) A hu-

man protein–protein interaction network: a resource

for annotating the proteome. Cell 122:957–968

9. Ramani AK, Bunescu RC, Mooney RJ, Marcotte

EM (2005) Consolidating the set of known human

protein–protein interactions in preparation for

large-scale mapping of the human interactome.

Genome Biol 6:R40

10. Kann MG (2007) Protein interactions and disease:

computational approaches to uncover the etiology

of diseases. Brief Bioinform 8:333–346

11. Baudot A, Gómez-López G, Valencia A (2009)

Translational disease interpretation with molecu-

lar networks. Genome Biol 10:221. Review

12. Wu Z, Zhao X, Chen L (2009) Identifying respon-

sive functional modules from protein–protein in-

teraction network. Mol Cells 27:271–277. Review

13. Taylor IW, Linding R, Warde-Farley D et al

(2009) Dynamic modularity in protein interac-

tion networks predicts breast cancer outcome. Nat

Biotechnol 27:199–204

14. Wang YC, Chen BS (2011) A network-based bio-

marker approach for molecular investigation and

diagnosis of lung cancer. BMC Med Genom 4:2

15. Jonson PF, Bates PA (2006) Global topological

features of cancer proteins in the human interac-

tome. Bioinformatics 22:2291–2297

16. Xu J, Li Y (2006) Discovering disease-genes by

topological features in human protein–protein

networks. Bioinformatics 22:2800–2805

17. Sanz-Pamplona R, Aragüés R, Driouch K et al

(2011) Expression of endoplasmic reticulum stress

proteins is a candidate marker of brain metastasis

in both ErbB-2(+) and ErbB-2(–) primary breast

tumors. Am J Pathol 179:564–579

18. Pujana MA, Han JD, Starita LM et al (2007)

Network modeling links breast cancer suscep-

tibility and centrosome dysfunction. Nat Genet

39:1338–1349

19. Junker BH, Schreiber F (2007) Analysis of bio-

logical networks. Chapter 3: Graph theory. John

Wiley & Sons, Hoboken, NJ, USA

20. Keshava Prasad TS, Goel R, Kandasamy K et al

(2009) Human Protein Reference Database: 2009

update. Nucleic Acids Res 37:D767–772

21. von Mering C, Huynen M, Jaeggi D et al (2003)

STRING: a database of predicted functional as-

sociations between proteins. Nucleic Acids Res

31:258–261

22. Xenarios I, Salwínski L, Duan XJ et al (2002)

DIP, the Database of Interacting Proteins: a re-

search tool for studying cellular networks of pro-

tein interactions. Nucleic Acids Res 30:303–305

23. Lehne B, Schlitt T (2009) Protein–protein interac-

tion databases: keeping up with growing interac-

tomes. Hum Genomics 3:291–297

24. Berggård T, Linse S, James P (2007) Methods for

the detection and analysis of protein protein inter-

actions. Proteomics 7:2833–2842. Review

25. Shoemaker BA, Panchenko AR (2007) Decipher-

ing protein–protein interactions. Part II. Compu-

tational methods to predict protein and domain

interaction partners. PLoS Comput Biol 3:e43.

Review

26. Kolaczyk E (2009) Mapping networks. In: Statis-

tical analysis of network data. Springer

27. Huang S (2004) Back to the biology in systems

biology: what can we learn from biomolecular

networks? Brief Funct Genomic Proteomic 2:279–

297

28. Suderman M, Hallett M (2007) Tools for visually

exploring biological networks. Bioinformatics

23:2651–2659. Review

29. Dorogovtsev SN, Mendes JF, Samukhin AN

(2001) Size-dependent degree distribution of a

scale- free growing network. Phys Rev E Stat

Nonlin Soft Matter Phys 63:062101

30. Oti M, Snel B, Huynen MA, Brunner HG (2006)

Predicting disease genes using protein–protein

interactions. J Med Genet 43:691

31. Garcia-Garcia J, Guney E, Aragues R et al (2010)

Biana: a software framework for compiling bio-

logical interactions and analyzing networks. BMC

Bioinform 11:56

32. Lee SA, Chan CH, Chen TC et al (2009) POINeT:

protein interactome with sub-network analysis

and hub prioritization. BMC Bioinform 10:114

33. Minguez P, Götz S, Montaner D et al (2009)

Clin Transl Oncol (2012) 14:3-14

SNOW, a web-based tool for the statistical analy-

sis of protein–protein interaction networks. Nucle-

ic Acids Res 37:W109–114

34. Chaurasia G, Iqbal Y, Hänig C et al (2007) UniHI:

an entry gate to the human protein interactome.

Nucleic Acids Res 35:D590–594

35. Pa vlopoulos GA, Wegener AL, Schneider R

(2008) A survey of visualization tools for biologi-

cal network analysis. BioData Min 1:12

36. Shannon P, Markiel A, Ozier O et al (2003) Cy-

toscape: a software environment for integrated

models of biomolecular interaction networks.

Genome Res 13:2498–2504

37. Killcoyne S, Carter GW, Smith J, Boyle J (2009)

Cytoscape: a community-based framework for net-

work modeling. Methods Mol Biol 563:219–239

38. Pavlopoulos GA, O’Donoghue SI, Satagopam VP

et al (2008) Arena3D: visualization of biological

networks in 3D. BMC Syst Biol 2:104

39. Hooper SD, Bork P (2005) Medusa: a simple tool

for interaction graph analysis. Bioinformatics

21:4432–4433

40. Assenov Y, Ramírez F, Schelhorn SE et al (2008)

Computing topological parameters of biological

networks. Bioinformatics 24:282–284

41. Barabasi AL, Albert R (1999) Emergence of scal-

ing in random networks. Science 286:509–512

42. Barabási AL, Bonabeau E (2003) Scale-free net-

works. Sci Am 288:60–69

43. Goh KI, Cuskick ME, Valle D et al (2007) The

human disease network. Proc Natl Acad Sci U S

A 104:8685–8690

44. He X, Zhang J (2006) Why do hubs tend to be es-

sential in protein networks? PLoS Genet 2:e88

45. Kar G, Gursoy A, Keskin O (2009) Human cancer

protein–protein interaction network: a structural

perspective. PLoS Comput Biol 5:e1000601

46. Chen J, Aronow BJ, Jegga AG (2009) Disease can-

didate gene identiﬁ cation and prioritization using

protein interaction networks. BMC Bioinform 10:73

47. Lin CY, Chin CH, Wu HH et al (2008) Hubba:

hub objects analyzer–a framework of interactome

hubs identiﬁ cation for network biology. Nucleic

Acids Res 36:W438–443

48. Junker BH, Koschützki D, Schreiber F (2006) Ex-

ploration of biological network centralities with

CentiBiN. BMC Bioinform 7:219

49. Junker BH, Schreiber F (2007) Network centrali-

ties. In: Analysis of biological networks. John Wi-

ley & Sons, Hoboken, NJ, USA

50. Milo R, Shen-Orr S, Itzkovitz S et al (2002) Net-

work motifs: simple building blocks of complex

networks. Science 298:824–827

51. Moon HS, Bhak J, Lee KH, Lee D (2005) Ar-

chitecture of basic building blocks in protein and

domain structural interaction networks. Bioinfor-

matics 21:1479–1486

52. Assenov Y, Ramírez F, Schelhorn SE et al (2008)

Computing topological parameters of biological

networks. Bioinformatics 24:282–284

53. Schreiber F, Schwöbbermeyer H (2005) MAVisto:

a tool for the exploration of network motifs. Bio-

informatics 21:3572–3574

54. Wernicke S, Rasche F (2006) FANMOD: a tool

for fast network motif detection. Bioinformatics

22:1152–1153

55. Barabási AL, Oltvai ZN (2004) Network biology:

understanding the cell’s functional organization.

Nat Rev Genet 5:101–113. Review

56. Balasundaram B, Butengo S (2007) Network

clustering. In: Analysis of biological networks.

John Wiley & Sons, Hoboken, NJ, USA

57. Luo F, Yang Y, Chen C-F et al (2007) Modular

organization of protein interaction networks. Bio-

informatics 23:207–214

58. Hartwell LH, Hopﬁ eld JJ, Leibler S, Murray AW

(1999) From molecular to modular cell biology.

Nature 402[6761 Suppl]:C47–52

59. Reimand J, Tooming L, Peterson H et al (2008)

GraphWeb: mining heterogeneous biological net-

works for gene modules with functional signiﬁ -

cance. Nucleic Acids Res 36:W452–459

60. Vlasblom J, Wu S, Pu S et al (2006) GenePro: a

Cytoscape plug-in for advanced visualization and

analysis of interaction networks. Bioinformatics

22:2178–2179

61. Bader GD, Hogue CW (2003) An automated

method for ﬁ nding molecular complexes in large

protein interaction networks. BMC Bioinform 4:2

62. Maere S, Heymans K, Kuiper M (2005) BiNGO:

a Cytoscape plugin to assess overrepresentation of

gene ontology categories in biological networks.

Bioinformatics 21:3448–3449

63. Brohée S, Faust K, Lima-Mendez G et al (2008)

NeAT: a toolbox for the analysis of biological

networks, clusters, classes and pathways. Nucleic

Acids Res 36:W444–451

64. Rivera CG, Vakil R, Bader JS (2010) NeMo: Net-

work Module identiﬁ cation in Cytoscape. BMC

Bioinform 11[Suppl 1]:S61

65. Ma’ayan A (2008) Network integration and graph

analysis in mammalian molecular systems biol-

ogy. IET Syst Biol 2:206–221. Review

66. Liu ET (2005) Systems biology, integrative biolo-

gy, predictive biology. Cell 121:505–506. Review

67. McDermott JE, Costa M, Janszen D et al (2010)

Separating the drivers from the driven: integrative

network and pathway approaches aid identiﬁ ca-

tion of disease biomarkers from high-throughput

data. Dis Markers 28:253–266. Review

68. Mathew JP, Taylor BS, Bader GD et al (2007)

From bytes to bedside: data integration and com-

putational biology for translational cancer re-

search. PLoS Comput Biol 3:e12

69. Camargo A, Azuaje F (2007) Linking gene ex-

pression and functional network data in human

heart failure. PLoS One 2:e1347

70. Barsky A, Gardy JL, Hancock RE, Munzner T

(2007) Cerebral: a Cytoscape plugin for layout

of and interaction with biological networks using

subcellular localization annotation. Bioinformat-

ics 23:1040–1042

71. Paquette J, Tokuyasu T (2010) EGAN: explor-

atory gene association networks. Bioinformatics

26:285–286

72. Junker BH, Klukas C, Schreiber F (2006) VANT-

ED: a system for advanced data analysis and vi-

sualization in the context of biological networks.

BMC Bioinform 7:109

73. Markowitz SD, Bertagnolli MM (2009) Molecu-

lar origins of cancer: molecular basis of colorectal

cancer. N Engl J Med 361:2449–2460. Review

74. Ohta M, Seto M, Ijichi H et al (2009) Decreased

expression of the RAS-GTPase activating protein

RASAL1 is associated with colorectal tumor pro-

gression. Gastroenterology 136:206–216

75. Moon RT (2005) Wnt/beta-catenin pathway. Sci

STKE 2005:cm1. Review

76. Bertucci F, Salas S, Eysteries S et al (2004) Gene

expression profiling of colon cancer by DNA

microarrays and correlation with histoclinical

parameters. Oncogene 23:1377–1391

77. Minguez P, Dopazo J (2011) Assessing the bio-

logical signiﬁ cance of gene expression signatures

and co-expression modules by studying their net-

work properties. PLoS One 6:e17474

78. Henderson BR (2000) Nuclear-cytoplasmic shut-

tling of APC regulates beta-catenin subcellular lo-

calization and turnover. Nat Cell Biol 2:653–660

79. Chua HN, Wong L (2008) Increasing the reliabil-

ity of protein interactomes. Drug Discov Today

13:652–658

80. Strogatz SH (2001) Exploring complex networks.

Nature 410:268–276. Review

81. Hanahan D, Weinberg RA (2011) Hallmarks of

cancer: the next generation. Cell 144:646–674

82. Kenny PA, Lee GY, Bissell MJ (2007) Target-

ing the tumor microenvironment. Front Biosci

12:3468–3474

Overview of methods for characterization and visualization of a protein–protein interaction network in a multi-omics integration context

Article

Full-text available

Sep 2022

At the heart of the cellular machinery through the regulation of cellular functions, protein–protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.

Network analysis reveals essential proteins that regulate sodium-iodide symporter expression in anaplastic thyroid carcinoma

Article

Full-text available

Dec 2020

Anaplastic thyroid carcinoma (ATC) is the most rare and lethal form of thyroid cancer and requires effective treatment. Efforts have been made to restore sodium-iodide symporter (NIS) expression in ATC cells where it has been downregulated, yet without complete success. Systems biology approaches have been used to simplify complex biological networks. Here, we attempt to find more suitable targets in order to restore NIS expression in ATC cells. We have built a simplified protein interaction network including transcription factors and proteins involved in MAPK, TGFβ/SMAD, PI3K/AKT, and TSHR signaling pathways which regulate NIS expression, alongside proteins interacting with them. The network was analyzed, and proteins were ranked based on several centrality indices. Our results suggest that the protein interaction network of NIS expression regulation is modular, and distance-based and information-flow-based centrality indices may be better predictors of important proteins in such networks. We propose that the high-ranked proteins found in our analysis are expected to be more promising targets in attempts to restore NIS expression in ATC cells.

Proteomic-Based Discovery of Predictive Biomarkers for Drug Therapy Response and Personalized Medicine in Chronic Immune Thrombocytopenia

Article

Full-text available

Oct 2023
BMRI

Purpose ITP is the most prevalent autoimmune blood disorder. The lack of predictive biomarkers for therapeutic response is a major challenge for physicians caring of chronic ITP patients. This study is aimed at identifying predictive biomarkers for drug therapy responses. Methods 2D gel electrophoresis (2-DE) was performed to find differentially expressed proteins. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometer (MALDI-TOF MS) analysis was performed to identify protein spots. The Cytoscape software was employed to visualize and analyze the protein-protein interaction (PPI) network. Then, enzyme-linked immunosorbent assays (ELISA) were used to confirm the results of the proteins detected in the blood. The DAVID online software was used to explore the Gene Ontology and pathways involved in the disease. Results Three proteins, including APOA1, GC, and TF, were identified as hub-bottlenecks and confirmed by ELISA. Enrichment analysis results showed the importance of several biological processes and pathway, such as the PPAR signaling pathway, complement and coagulation cascades, platelet activation, vitamin digestion and absorption, fat digestion and absorption, cell adhesion molecule binding, and receptor binding. Conclusion and Clinical Relevance. Our results indicate that plasma proteins (APOA1, GC, and TF) can be suitable biomarkers for the prognosis of the response to drug therapy in ITP patients.

Protein–Protein Interaction (PPI) Network of Zebrafish Oestrogen Receptors: A Bioinformatics Workflow

Article

Full-text available

Apr 2022

Protein–protein interaction (PPI) is involved in every biological process that occurs within an organism. The understanding of PPI is essential for deciphering the cellular behaviours in a particular organism. The experimental data from PPI methods have been used in constructing the PPI network. PPI network has been widely applied in biomedical research to understand the pathobiology of human diseases. It has also been used to understand the plant physiology that relates to crop improvement. However, the application of the PPI network in aquaculture is limited as compared to humans and plants. This review aims to demonstrate the workflow and step-by-step instructions for constructing a PPI network using bioinformatics tools and PPI databases that can help to predict potential interaction between proteins. We used zebrafish proteins, the oestrogen receptors (ERs) to build and analyse the PPI network. Thus, serving as a guide for future steps in exploring potential mechanisms on the organismal physiology of interest that ultimately benefit aquaculture research.

A multidimensional systems biology analysis of cellular senescence in aging and disease

Article

Full-text available

Apr 2020
GENOME BIOL

Background: Cellular senescence, a permanent state of replicative arrest in otherwise proliferating cells, is a hallmark of aging and has been linked to aging-related diseases. Many genes play a role in cellular senescence, yet a comprehensive understanding of its pathways is still lacking. Results: We develop CellAge (http://genomics.senescence.info/cells), a manually curated database of 279 human genes driving cellular senescence, and perform various integrative analyses. Genes inducing cellular senescence tend to be overexpressed with age in human tissues and are significantly overrepresented in anti-longevity and tumor-suppressor genes, while genes inhibiting cellular senescence overlap with pro-longevity and oncogenes. Furthermore, cellular senescence genes are strongly conserved in mammals but not in invertebrates. We also build cellular senescence protein-protein interaction and co-expression networks. Clusters in the networks are enriched for cell cycle and immunological processes. Network topological parameters also reveal novel potential cellular senescence regulators. Using siRNAs, we observe that all 26 candidates tested induce at least one marker of senescence with 13 genes (C9orf40, CDC25A, CDCA4, CKAP2, GTF3C4, HAUS4, IMMT, MCM7, MTHFD2, MYBL2, NEK2, NIPA2, and TCEB3) decreasing cell number, activating p16/p21, and undergoing morphological changes that resemble cellular senescence. Conclusions: Overall, our work provides a benchmark resource for researchers to study cellular senescence, and our systems biology analyses reveal new insights and gene regulators of cellular senescence.

A systematic review of graph-based explorations of PPI networks: methods, resources, and best practices

Article

Full-text available

May 2024

This systematic review aims to provide a comprehensive overview of graph-based methodologies utilized in the analysis of protein–protein interaction (PPI) networks. The primary objective is to synthesize existing literature and identify key methodologies, resources, and best practices in the field, with a focus on their application in uncovering essential cancer proteins. A systematic literature search was conducted across various databases to identify relevant studies focusing on graph-based explorations of PPI networks. The selected articles were critically reviewed, and data were extracted regarding the methodologies employed, resources utilized, and best practices identified. The review proceeds to outline a workflow that illustrates the systematic process from the compilation of gene/protein datasets to the generation of essential cancer proteins. A case study on “uncovering essential cancer proteins in breast cancer” was included to exemplify the application of graph-based methodologies in a real-world scenario. The review revealed various graph-based methodologies utilized in PPI network analysis, including centrality measures, pathway enrichment analyses, and network visualization techniques. Essential resources such as databases, software tools, and repositories were identified, along with best practices for data preprocessing, network construction, and analysis. The synthesis of findings, complemented by the case study, provides researchers with a comprehensive understanding of the current landscape of graph-based PPI network analysis and its application in cancer research. This systematic review contributes to the field by offering a holistic overview of graph-based explorations in PPI network research, with a specific focus on cancer protein identification. By synthesizing existing knowledge and identifying essential resources and best practices, this review serves as a valuable resource for researchers, facilitating informed decision-making and enhancing research quality and reproducibility. The inclusion of the case study underscores the practical application of graph-based methodologies in uncovering essential cancer proteins.

Community detection in Epstein-Barr virus associated carcinomas and role of tyrosine kinase in etiological mechanisms for oncogenesis

Article

May 2023
MICROB PATHOGENESIS

Background: Epstein-Barr virus (EBV) affects more than 90% of global population. The role of the virus in causing infectious mononucleosis (IM) affecting B-cells and epithelial cells and in the development of EBV associated cancers is well documented. Investigating the associated interactions can pave way for the discovery of novel therapeutic targets for EBV associated lymphoproliferative (Burkitt's Lymphoma and Hodgkin's Lymphoma) and non-lymphoproliferative diseases (Gastric cancer and Nasopharyngeal cancer). Methods: Based on the DisGeNET (v7.0) data set, we constructed a disease-gene network to identify genes that are involved in various carcinomas, viz. Gastric cancer (GC), Nasopharyngeal cancer (NPC), Hodgkin's lymphoma (HL) and Burkitt's lymphoma (BL). We identified communities in the disease-gene network and performed functional enrichment using over-representation analysis to detect significant biological processes/pathways and the interactions between them. Result: We identified the modular communities to explore the relation of this common causative pathogen (EBV) with different carcinomas such as GC, NPC, HL and BL. Through network analysis we identified the top 10 genes linked with EBV associated carcinomas as CASP10, BRAF, NFKBIA, IFNA2, GSTP1, CSF3, GATA3, UBR5, AXIN2 and POLE. Further, the tyrosine-protein kinase (ABL1) gene was significantly over-represented in 3 out of 9 critical biological processes, viz. in regulatory pathways in cancer, the TP53 network and the Imatinib and chronic myeloid leukemia biological processes. Consequently, the EBV pathogen appears to target critical pathways involved in cellular growth arrest/apoptosis. We make our case for BCR-ABL1 tyrosine-kinase inhibitors (TKI) for further clinical investigations in the inhibition of BCR-mediated EBV activation in carcinomas for better prognostic and therapeutic outcomes.

Multi-SANA: Comparing Measures of Topological Similarity for Multiple Network Alignment

Article

Oct 2022
IEEE T EVOLUT COMPUT

All life on Earth is related, so that some molecular interactions are common across almost all living cells, with the number of common interactions increasing as we look at more closely related species. In particular, we expect the protein–protein interaction (PPI) networks of closely related species to share high levels of similarity. This similarity may facilitate the transfer of functional knowledge between model species and human. Multiple network alignment is the process of uncovering the connection similarity between three or more networks simultaneously. Existing algorithms for multiple network alignment rely on sequence similarities to help drive the alignments, and no comprehensive study has been done to determine the most effective ways to utilize network connectivity—network topology—to drive multiple network alignment. Here, we devise and empirically test the efficacy of several measures of topological similarity between three or more networks. To evolve the alignments toward optimal, we use simulated annealing as the search algorithm since it is agnostic to the objective being optimized. We test the measures both on the partially synthetic and highly similar PPI networks from the integrated interaction database, as well as on real PPI networks from a recent BioGRID release.

Melatonin reduces proliferation and promotes apoptosis of bladder cancer cells by suppressing O‐GlcNAcylation of CDK5

Article

Sep 2021

Melatonin helps maintain circadian rhythm, exerts anticancer activity, and plays key roles in regulation of glucose homeostasis and energy metabolism. Glycosylation, a form of metabolic flux from glucose or other monosaccharides, is a common post‐translational modification. Dysregulated glycosylation, particularly O‐GlcNAcylation, is often a biomarker of cancer cells. In this study, elevated O‐GlcNAc level in bladder cancer was inhibited by melatonin treatment. Melatonin treatment inhibited proliferation and migration and enhanced apoptosis of bladder cancer cells. Proteomic analysis revealed reduction of cyclin‐dependent‐like kinase 5 (CDK5) expression by melatonin. O‐GlcNAc modification determined the conformation of critical T‐loop domain on CDK5, and further influenced the CDK5 stability. The mechanism whereby melatonin suppressed O‐GlcNAc level was based on decreased glucose uptake and metabolic flux from glucose to UDP‐GlcNAc, and consequent reduction of CDK5 expression. Melatonin treatment, inhibition of O‐GlcNAcylation by OSMI‐1, or mutation of key O‐GlcNAc site strongly suppressed in vivo tumor growth. Our findings indicate that melatonin reduces proliferation and promotes apoptosis of bladder cancer cells by suppressing O‐GlcNAcylation of CDK5.

Three-Dimensional Interfacing of Cells with Hierarchical Silicon Nano/Microstructures for Midinfrared Interrogation of In Situ Captured Proteins

Article

Feb 2021
ACS APPL MATER INTER

Label-free optical detection of biomolecules is currently limited by a lack of specificity rather than sensitivity. To exploit the much more characteristic refractive index dispersion in the mid-infrared (IR) regime, we have engineered three-dimensional IR-resonant silicon micropillar arrays (Si-MPAs) for protein sensing. By exploiting the unique hierarchical nano- and microstructured design of these Si-MPAs attained by CMOS-compatible silicon-based microfabrication processes, we achieved an optimized interrogation of surface protein binding. Based on spatially resolved surface functionalization, we demonstrate controlled three-dimensional interfacing of mammalian cells with Si-MPAs. Spatially controlled surface functionalization for site-specific protein immobilization enabled efficient targeting of soluble and membrane proteins into sensing hotspots directly from cells cultured on Si-MPAs. Protein binding to Si-MPA hotspots at submonolayer level was unambiguously detected by conventional Fourier transform IR spectroscopy. The compatibility with cost-effective CMOS-based microfabrication techniques readily allows integration of this novel IR transducer into fully fledged bioanalytical microdevices for selective and sensitive protein sensing.

Supplemental Data Resource A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome

Article

Full-text available

Nov 2004
CELL

Human Protein Reference Database–2009 update

Article

Jan 2009

Wnt/beta-catenin pathway

Article

Jan 2005

R.T. Moon

Emergence of Scaling in Random Networks

Article

Jan 1999

Springer Series in Statistics

Chapter

Feb 2009

Eric D. Kolaczyk

The systematic collection and analysis of data on networks of one form or another goes back at least to the 1930’s in certain select areas of science, and in fact has subtle roots reaching back centuries further. However, during the decade surrounding the turn of the 21st century, network-centric analysis, as a general approach to scientific inquiry, has reached entirely new levels of prevalence and sophistication, with practitioners in fields now ranging from the physical and mathematical sciences to the social sciences and humanities. In this chapter we present a ‘birds-eye’ view of the area that is gradually coming to be known as ‘network science,’ starting with some background, continuing with a mosaic of examples, and finishing with a discussion of the organization and philosophy of this book.

Targeting the tumor microenvironment

Article

Jan 2007

A. Kenny Paraic

Despite some notable successes cancer remains, for the most part, a seemingly intractable problem. There is, however, a growing appreciation that targeting the tumor epithelium in isolation is not sufficient as there is an intricate mutually sustaining synergy between the tumor epithelial cells and their surrounding stroma. As the details of this dialogue emerge, new therapeutic targets have been proposed. The FDA has already approved drugs targeting microenvironmental components such as VEGF and aromatase and many more agents are in the pipeline. In this article, we describe some of the 'druggable' targets and processes within the tumor microenvironment and review the approaches being taken to disrupt these interactions.

Network Clustering

Article

Jan 2008

Computing topological parameters of biological networks

Article

Jan 2008

Assenov Y, Ramirez F, Schelhorn S-E, Lengauer T, Albrecht M

10.1093/bioinformatics/btm554

Analysis of Network Flow Data

Article

Feb 2009

Eric D. Kolaczyk

Flows are at the heart of the form and function of many networks, and understanding their behavior is often a goal of primary interest. Here we consider problems of statistical estimation and prediction arising in connection with various types of measurements relating to network flows.

Murray AW: From molecular to modular cell biology

Article

Jan 1999
NATURE

Tools for protein-protein interaction network analysis in cancer research

Abstract and Figures

Recommended publications

Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design

What do we learn from high-throughput protein interaction data?

NetWeAvers: An R package for integrative biological network analysis with mass spectrometry data

Phylogenetic analysis of modularity in protein interaction networks