Content uploaded by Krishna Patel
Author content
All content in this area was uploaded by Krishna Patel on Oct 21, 2017
Content may be subject to copyright.
147
Shivakumar Keerthikumar and Suresh Mathivanan (eds.), Proteome Bioinformatics, Methods in Molecular Biology,
vol. 1549, DOI 10.1007/978-1-4939-6740-7_12, © Springer Science+Business Media LLC 2017
Chapter 12
Bioinformatics Methods to Deduce Biological
Interpretation from Proteomics Data
Krishna Patel, Manika Singh, and Harsha Gowda
Abstract
High-throughput proteomics studies generate large amounts of data. Biological interpretation of these
large scale datasets is often challenging. Over the years, several computational tools have been developed
to facilitate meaningful interpretation of large-scale proteomics data. In this chapter, we describe various
analyses that can be performed and bioinformatics tools and resources that enable users to do the analyses.
Many Web-based and stand-alone tools are relatively user-friendly and can be used by most biologists
without significant assistance.
Key words Gene ontology, FunRich, Reactome, NetPath, Phosphoproteome, Pathways, Enrichment,
Post-translational modifications
1 Introduction
High-throughput proteomics studies result in identification and
quantitation of thousands of proteins in a biological specimen.
These studies are often carried out to determine dynamic changes
in proteins including differential expression pattern between bio-
logical conditions, activation of specific signaling pathways and in
protein complexes. To achieve these, mass spectrometry based
methods are often employed to measure relative abundance of pro-
teins or post-translational modifications including phosphoryla-
tion, acetylation, glycosylation, and ubiquitination. Although such
large-scale studies generate enormous amount of data, they pose
significant challenge for biologists for biological interpretation.
Several commercial and open source tools have been devel-
oped over the years to facilitate biological interpretation of pro-
teomics data. These tools allow biologists to disentangle complexity
in large datasets and identify meaningful patterns. Most biological
processes are not driven by a single protein but many proteins act-
ing in concert. If any two biological conditions or cell phenotypes
were compared using quantitative proteomics, one could expect a
148
set of proteins that regulate these two distinct cell phenotypes or
biological conditions to be differentially expressed. Tools that are
developed to carry out gene set enrichment or overrepresentation
analysis enable identification of such patterns from large scale data-
sets. Such enrichment analysis can also facilitate functional annota-
tion of orphan molecules based on their association with other
well-characterized molecules. Here, we describe several tools that
can be used for such analysis in mammalian system, particularly
those that have well-annotated data including human.
2 Materials
Several commercial as well as open source tools are available for
carrying out bioinformatics analysis of high throughput datasets.
For each type of analysis, we are providing list of tools that can be
used in relevant sections of the chapter. A step-by-step instruction
is also provided for one tool in each section. General outline of the
workflow and different kinds of analysis that can be carried out is
provided in Fig. 1.
3 Methods
Gene ontology (GO) consortium has developed controlled vocab-
ulary to represent biological functions, processes, and cellular
localization information [1]. The terms are linked to correspond-
ing genes based on our understanding of gene function and local-
ization. This data is extensively used to carry out GO enrichment
analysis that provides insights into biological functions/processes
enriched in a large scale proteomics dataset. There are several tools
that have been developed to carry out enrichment analysis provid-
ing gene/protein list as an input. FunRich [2] is a user friendly
stand-alone tool for GO enrichment analysis. The tool allows users
to upload or paste gene symbols, gene ID, Uniprot ID, and RefSeq
protein ID as input for the analysis. Results of the enrichment anal-
ysis are produced in various graphical formats such as bar graph,
pie chart, Venn diagram, heat map, and doughnut chart. Multiple
gene sets can be uploaded for comparative analysis of GO enrich-
ment and pathway enrichment analysis. The tool provides various
graphical representation options for visualizing comparative results.
One of the widely used Web-based tools is DAVID (Database
for Annotation, Visualization, and Integrated Discovery (https://
david.ncifcrf.gov/) [3]. It provides a comprehensive set of func-
tional annotation tools which can not only identify enriched bio-
logical themes, particularly GO terms, but also discover functionally
related enriched gene groups based on popular pathway databases
including KEGG [4] and BioCarta [5]. Here we describe a step-
by- step guide for GO enrichment using DAVID.
3.1 Gene Ontology
Enrichment Analysis
Krishna Patel et al.
149
Fig. 1 A general framework and outline of various bioinformatics analyses approaches that can be used for high-throughput proteomic data
High Throughput Proteomic Analysis
150
There are two major DAVID tools that could be used for
functional annotation/classification of gene lists—Functional Anno-
tation and Gene Functional Classification. The tools can be accessed
by clicking the links on top left corner of the home page.
1. To begin the analysis, click on “Functional annotation”.
2. The resulting Web page shows three tabs—Upload, List, and
Background.
3. In the “Upload” tab, either paste gene list into the box or
browse and upload the list where there is a single column with
each row representing a single gene (see Note 1).
4. The ‘list’ tab in DAVID allows users to limit gene annotations
to one or more species. The default parameter chooses Homo
sapiens.
5. For enrichment analysis, user has to choose a background
using ‘Background’ tab. Default background in DAVID is
Homo sapiens whole genome background. The user can choose
to use a custom background.
6. DAVID recognizes gene lists with various identifiers including
official gene symbols and accession numbers. For proteomics
datasets, it is best to use official gene symbols in gene lists and
choose that as an identifier in step 2 in ‘Upload’ tab.
7. In step 3, choose if the list you uploaded should be used as
‘Gene List’ or ‘Background’. For data from human samples,
choose your input as ‘Gene List’ as Homo sapiens whole
genome background is used as a default.
8. Click ‘Submit List’ button. The results provided by DAVID
include ‘Functional Annotation Clustering’, ‘Functional
Annotation Chart’ and ‘Functional Annotation Table’. These
results provide a quick glance of major biological functions
enriched in the gene list.
9. For GO enrichment analysis, click on Gene_Ontology and
select GOTERM_BP_ALL for biological process, GOTERM_
CC_ALL for subcellular localization, and GOTERM_MF_
ALL for molecular function as background for the GO
enrichment analysis. Click on “Functional annotation cluster-
ing” and DAVID will generate clusters of terms with similar
biological meaning based on shared/similar gene members.
The significance of this enrichment is also calculated based on
modified Fisher Exact P-value.
10. Top panel of the result window is parameter panel which user can
modify according to need and rerun the process without submit-
ting input again. It is recommended to select higher stringency
for small, concise and meaningful clusters rather than broader
and vague cluster of proteins. Default setting is medium strin-
gency however user can modify this option based on the analysis.
Krishna Patel et al.
151
Higher enrichment score indicates that annotation term members
are overrepresented in uploaded input.
11. Result table displays annotation categories, enriched functional
annotation, enrichment scores of each cluster, number of genes
contributing to clustering of similar GO terms, and modified
Fisher Exact P-value.
12. To analyze the most enriched clusters, user can sieve out clus-
ters with maximum enrichment score and lesser P-value for
biological process, molecular function and subcellular localiza-
tion (see Note 2).
13. A link to ‘G’ on top of each cluster could be used to extract
defined set of proteins contributing to enrichment of the given
cluster and matrix icon draws heat map for the small cluster
and provides the GO term count matrix for each protein which
can be further used for plotting graphs.
14. User can also employ pathway and functional domain enrich-
ment analysis using DAVID by selecting “Pathway”, “Functional
categories” and “Protein domains” as backend reference data-
base for functional annotation. However, a user-friendly graphi-
cal user interface for pathways analysis study is deployed by
Web-resource Reactome which is explained in detail below.
Table 1 enlists other widely used open source gene set enrich-
ment analysis tools.
Proteins regulate most cellular processes. Several proteins work in
concert to regulate these processes and are often grouped into spe-
cific pathways in which they carry out their functions. Over the
years, pathways and processes that are regulated by specific pro-
teins have been systematically annotated. Based on protein expres-
sion data, it is possible to arrive at pathways and processes that are
active in a biological sample. In addition to expression, some of the
most widely studied signaling pathway mechanisms include
dynamic interplay of kinases and phosphatases that results in addi-
tion or removal of phosphorylation on proteins. Differential pro-
tein expression data or phosphoproteomics data can be utilized to
carry out pathway enrichment analysis. If expression or phosphor-
ylation levels of certain proteins are changing in a biological sample
as compared to their pattern in an appropriate control, it is possible
to predict potential pathways that are differentially regulated.
Reactome [14] is manually curated open access Web-based resource
of biological pathways which allows users to browse, search and
map proteins onto pathways. It also provides list of interactors
acquired from IntAct [15] molecular interaction database with
nodes of pathways.
Here we describe Reactome, a Web-based tool that can be
used for pathway analysis.
3.2 Pathway
Analysis
High Throughput Proteomic Analysis
152
Table 1
List of tools that can be used for gene ontology and gene set enrichment analysis
Name Description Link Reference
GSEA Gene set enrichment analysis
(GSEA) is an expression analytics
tool. It compares gene set
enrichment between conditions
and provides enriched set of
genes with their statistical
significance scores to interpret
biological data
Stand-alone http://www.
broadinstitute.org/gsea/
[6]
FunRich FunRich is a downloadable tool for
pathways and GO enrichment
analysis of genes and proteins. It
can process genes/proteins
irrespective of source of the
sample as user can load
customized database along with
default available background
database
Stand-alone http://funrich.org/ [2]
GoMiner GoMiner leverages Gene Ontology
by providing a framework to
visualize and integrate “omics”
data. It makes cluster of genes
and their expression profiles
which can be analyzed for their
biological significance. Each
gene is linked to BioCarta, Entez
Genome, NCBI structures,
Pubmed and MedMiner for
greater clarity
Stand-alone, Web http://
discover.nci.nih.gov/gominer
[7]
GOstat GOstat tool uses GO terms
database to find statistically over
represented genes from the data
set. The results list out
significant set of genes for
biological interpretation
Web http://gostat.wehi.edu.au [8]
GOToolBox GOToolBox is used for functional
annotation of genes. GOtoolBox
is a perl based program which
can be automated in any gene
expression analysis pipeline.
GOToolBox also has GO-Diet
and PRODISTIN framework
which can be used to study
protein–protein interactions
Web http://genome.crg.es/
GOToolBox/
[9]
(continued)
Krishna Patel et al.
153
1. Reactome (http://www.reactome.org/) allows mapping the
list of proteins on pathways and carry out enrichment analysis
to determine if the input data contains overrepresentation of
proteins involved in certain pathways (see Note 3).
2. Click on “Analyze Data”. It is a three-step process that begins
with pasting the protein list with appropriate header on the
Web page. The tool also takes accession numbers and other
identifiers as an input. In the next step, it allows projection of
data on to human annotation if it comes from a different spe-
cies and also to include interactors from IntAct Molecular
Interaction database. After making appropriate selection, click
on analyze.
3. The resulting page is divided into four panels. ‘Hierarchy
panel’ on the left part of the Web page lists enriched pathways
with corresponding FDR, ‘Viewport’ panel shows graphical
representation of an overview of these pathways with various
options to navigate, top panel provides configuration options
and a bottom panel provides details of objects selected in the
pathway diagram. A detailed manual to understand and navi-
gate this pathway analysis tool can be found at http://wiki.
reactome.org/index.php/Usersguide.
Table 1
(continued)
Name Description Link Reference
GeneMerge GeneMerge enables over-
representation analysis of gene
attributes in a given set of genes
as compared to genome
background
Stand-alone, Web http://www.
genemerge.net/
[10]
GO:TermFinder GO:TermFinder is a tool that helps
to find significant GO terms
shared among a list of genes. It
has GO:TermFinder libraries
that enables visualization of
results
Stand-alone http://search.cpan.
org/dist/GO-TermFinder/
[11]
agriGO agriGO is a specialized data
analytics tool for the agricultural
community. The database has 38
agricultural species comprising of
274 data types
Web http://bioinfo.cau.edu.cn/
agriGO/
[12]
FatiGO FatiGO helps to find significant
over- representation of functional
annotations in one gene set
compared to the other
Web http://babelomics.bioinfo.
cipf.es
[13]
High Throughput Proteomic Analysis
154
There are various commercial tools such as QIAGEN Ingenuity
Pathway Analysis (IPA) and Agilent Genomics Genespring for
functional and pathway enrichment analysis. Table 2 lists some of
the widely used pathway resources and network analysis tools.
Post-translational modifications (PTM) play an important role in
regulating various cellular processes. One of the most widely stud-
ied PTM is phosphorylation. It acts as a switch for activation and
deactivation of specific proteins and associated signaling pathways.
This modification serves as a rapid and reversible means to modu-
late protein activity and transduce signals. Advent of mass spec-
trometry has revolutionized our ability to map PTMs. These
studies have provided a comprehensive view of proteins that
undergo modifications along with specific sites. Based on our
understanding of enzyme–substrate relationships and specific
motifs that are targeted for post-translational modifications, a
number of computational tools have been developed to predict
PTMs. These tools can be utilized to evaluate the validity of identi-
fied sites in large scale studies (based on known sites in the data-
base) or predict potential modifications.
Human Protein Reference Database (HPRD) [21] is a reposi-
tory of manually curated PTM sites. Phospho.ELM [22] is a
resource of experimentally validated phosphorylation sites that are
manually curated from the literature. The RESID [23] database
provides PTM information with literature citation, protein feature
table, molecular models, structure diagrams and Gene Ontology
cross reference. PhosphoSitePlus [24] is a comprehensive reposi-
tory of curated phosphosites containing reference and orthologous
residues in other species. O-GLYCBASE [25] is a resource con-
taining experimentally verified O-linked glycosylation sites.
Unimod [26] is a comprehensive public domain database of pro-
tein modifications for mass spectrometry application.
Most extensively studied PTM is phosphorylation. Protein
kinases add phosphate moieties to Tyr, Ser, or Thr residues. Mass
spectrometry is being extensively used to investigate protein phos-
phorylation in a high-throughput manner. Phosphorylation either
increases or decreases the activity of target protein. Overlaying phos-
phoproteomic data on curated pathways can provide insights into
activation or deactivation of a particular signaling pathway.
PhosphositePlus [24] and PHOSIDA [27] are comprehensive repos-
itories of curated phosphosites containing reference and orthologous
residues in other species. Protein sequences can be analyzed using
various prediction tools for identifying phosphosites such as
KinasePhos 2.0 [28], NetPhos 2.0 [29], and DISPHOS 1.3 [30].
Several computational approaches have been developed to pre-
dict acetylation sites. NetAcet [31] is a neural network based
N-terminal acetylation site prediction tool, N-Ace [32] predicts
acetylation sites based on physicochemical properties of protein with
accessible surface area, PSKAcePred [33] is an approach that uses
3.3 Post-
translational
Modification Analysis
Krishna Patel et al.
155
Table 2
List of pathway resources and network analysis tools
Name Description Link Reference
NetPath NetPath is a manually curated resource
of signal transduction pathways.
Pathway data can be browsed,
visualized or downloaded in PSI-MI,
BioPAX and SBML formats. These
standard formats enable visualization
using external tools like Cytoscape
Web www.netpath.org [16]
PANTHER Protein ANalysis THrough
Evolutionary Relationships
(PANTHER) is an analysis
framework with multiple tools for
evolutionary and functional
classification of proteins. Panther
pathway resource allows visualization
of protein expression data in the
context of pathway diagrams
Web http://www.pantherdb.org/
pathway
[17]
KEGG Kyoto encyclopedia of genes and
genomes (KEGG) is an integrated
database resource. Pathway maps
and annotation in KEGG is widely
used for pathway enrichment analysis
Web http://www.genome.jp/
kegg/
[4]
STRING Search Tool for the Retrieval of
Interacting Genes/Proteins
(STRING) is a database of protein–
protein interactions
Web http://string-db.org/ [18]
FunRich FunRich is a downloadable tool for
pathways and GO enrichment
analysis of genes and proteins. It can
process genes/proteins irrespective
of source of the sample as user can
load customized database along with
default available background
database
Stand-alone http://funrich.org/ [2]
MINT MINT: Molecular INTeraction is a
curated molecular interaction
database
Web, stand-alone http://mint.
bio.uniroma2.it/mint/
Welcome.do
[19]
NetworKIN NetworKIN database provides
interface to analyze cellular
phosphorylation networks. It allows
users to query precomputed
kinase–substrate relations or obtain
predictions on novel
phosphoproteins
Web, stand-alone http://
networkin.info
[20]
High Throughput Proteomic Analysis
156
evolutionary similarity along with physicochemical properties to
predict lysine acetylation sites and Species Specific Prediction of
Lysine Acetylation (SSPKA) [34] is a computational framework that
incorporates predicted secondary structure information, and com-
bines functional features and sequence feature to predict species-
specific acetylation sites across six different species—H. sapiens, R.
norvegicus, M. musculus, E. coli, S. typhimurium and S. cerevisiae.
Ubiquitination is one of the most difficult PTMs to be identi-
fied due to its low abundance, size, and dynamic regulation. Due
to larger size of ubiquitin compared to other PTMs, it is difficult to
capture by mass spectrometry. However, several ubiquitination
sites have been mapped in the last few years based on diglycine-
modified lysine tag can be identified by mass spectrometry. Several
tools including UbPred [35], UbiPred [36], E3Miner [37], hCK-
SAAP_UbSite [38], and iUbiq-Lys [39] have been developed over
the years for prediction of ubiquitination sites. hUbiquitome [40]
is a comprehensive repository of experimentally verified human
ubiquitination enzymes and substrates.
Small ubiquitin-like modifier (SUMO) attaches to various tar-
get proteins and modulates cellular processes such as DNA replica-
tion, transcription, cell division, nuclear trafficking, and DNA
damage response. SUMOylation affects half-life, localization of
targets or binding partners and is a crucial mechanism that allows
cells to adapt to stress stimuli. Identification of SUMO sites has
enabled us to identify strong dependency of SUMOylation events
on other PTMs [41]. SUMOsp [42] and GPS-SUMO [43] pre-
dicts SUMO sites on proteins.
Glycosylation is a common PTM that plays a crucial role in
protein folding, cell–cell interaction, antigenicity, transport, and
half-life. There are four types of glycosylation: N-linked, O-linked,
C-mannosylation, and GPI anchor attachment. EnsembleGly [44]
predicts both O- and N-linked glycosylation sites, NetCGlyc [45]
predicts C-mannosylation, NetOGlyc [46] predicts O-glycosylation
sites, and NetNGlyc [47] predicts N-Glycosylation sites; PredGPI
[48] and GPI-SOM [49] predict GPI anchor sites in a protein.
Scansite [50] is a tool to analyze protein sequence for phos-
phorylation motifs recognized by many kinases and Motif-X [51]
allows prediction of various PTM site motifs by identifying over-
represented residues in the flanking regions. ProMEX [52] is a
database of mass spectra of tryptic peptides from plant proteins and
phosphoproteins.
Here we describe PTM analysis using commonly used PTM
database Phospho.ELM [22] and phosphorylation PTM site pre-
dictor NetPhos 2.0 [29].
1. To identify experimentally validated PTMs of a given protein,
browse Phospho.ELM database (http://phospho.elm.eu.
org/index.html). Database can be queried using protein name,
UniPROT accession, and Ensembl identifier.
Krishna Patel et al.
157
2. Result page of Phospho.ELM database consists of table detailing
residue, position of residue in proteins, flanking sequence with
PTM site, kinase, PubMed reference for each site reported,
conservation score, cross-reference to eukaryotic linear motif
resource (ELM: http://elm.eu.org/), phospho-peptide bind-
ing domain, SMART domains, and cross-reference to PDB
link along with other information such as substrate, cross-ref-
erence to PHOSIDA [27], PhosphositePlus [24], MINT [19],
and GO-Terms [1].
3. Computational prediction of phosphorylation can be done
using NetPhos 2.0 server (http://www.cbs.dtu.dk/services/
NetPhos/). Users can submit protein sequence in FASTA for-
mat and select target residue for phosphorylation (tyrosine,
serine, or threonine). By default, all three residues are checked
in the analysis. Select checkbox if users wish to generate graph-
ical output.
4. Click on “Submit” to initiate analysis. In a single query, up to
2000 protein sequences can be analyzed by this Web-based
tool.
5. Result page will display table detailing submitted protein ID,
residue position, PTM site with flanking sequences and score.
Three tables are separately generated for serine, threonine, and
tyrosine.
6. A graphical result depicts propensity of a residue on a given
position as PTM site. Three different color peaks are used for
each residue (S,T,Y) on an X-Y plane where X-axis is sequence
position and Y-axis is phosphorylation potential.
A multitude of tools are available for data integration and visualiza-
tion of “omics” data-sets (Table 3). Most visualization tools focus
on biomolecular interactions and pathways. These tools commonly
employ 2D graphs for data representation. The basic efficiency of
these tools lies in its compatibility with other tools and databases.
4 Notes
1. It is preferable to use ‘Gene Symbol’ as unique identifier for
genes. DAVID has ID conversion tool that can be used to pre-
pare the lists with uniform identifiers.
2. Enrichment analysis methods often involve statistical tests to
determine if input data contains overrepresentation of proteins
involved in certain functions, processes, or pathways more than
what is expected by chance. This is calculated with respect to
the background database used by respective tools. Many tools
also provide flexibility for users by providing the option of
using custom database as background. Knowledge of statistical
3.4 Visualization
Tools
High Throughput Proteomic Analysis
158
approach employed in such tools would allow user to make
relevant selections for different kind of datasets to identify most
enriched genes/proteins cluster.
3. Pathway enrichment analysis is done using the pathway data-
base used in the background. Back end pathway database used
for analysis will directly influence the outcomes of the pathway
analysis. This aspect should be taken into consideration and
users should select appropriate pathway annotation resource
most suitable for intended pathway analysis.
Table 3
List of pathway analysis and visualization tools
Name Description Link Reference
GenMAPP GenMAPP is a Web-based visualization
tool for gene/protein expression
profiles. It has MAPPBuilder tool for
creating MAPP file (.mapp) which
creates graphical pathway representation
of genes and MAPPFinder tool to
annotate the pathway. Each gene is
identified by unique geneID from
Genbank. MAPP files can be shared and
manipulated by the user
Stand-alone http://www.
genmapp.org
[53]
CytoScape Cytoscape is Java-based stand-alone tool
which supports large scale network
analysis. Both protein–protein and
protein–gene networks can be visualized
and edited. The standard file format of
Cytoscape is Cytoscape Session File (.
cys). Input file in Cytoscape can be
delimited text table or excel workbook
though it supports all major input
formats. The result can be exported in
any of the formats like SIF, GML,
XGMML, and PSI-MI formats
Stand-alone http://www.
cytoscape.org/
[54]
Medusa Medusa is Java application for visualization
of complex pathways. Result from
STRING pathway database can be
analyzed in Medusa. Medusa is less
suited for big datasets
Stand-alone https://sites.
google.com/site/
medusa3visualization/
[55]
Perseus Perseus is a statistical analysis visualization
tool for proteomics data. It has
incorporated multiple statistical methods
like t-test, clustering, enrichment analysis
including normalization of data. It
provides various graphs for visualization
of data like scatter plot and volcano plot
Stand-alone http://www.
biochem.mpg.
de/5111810/perseus
[56]
Krishna Patel et al.
159
References
1. Ashburner M, Ball CA, Blake JA, Botstein D,
Butler H, Cherry JM, Davis AP, Dolinski K,
Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-
Tarver L, Kasarskis A, Lewis S, Matese JC,
Richardson JE, Ringwald M, Rubin GM, Sherlock
G (2000) Gene ontology: tool for the unification
of biology. The gene ontology consortium. Nat
Genet 25(1):25–29. doi:10.1038/75556
2. Pathan M, Keerthikumar S, Ang CS, Gangoda
L, Quek CY, Williamson NA, Mouradov D,
Sieber OM, Simpson RJ, Salim A, Bacic A, Hill
AF, Stroud DA, Ryan MT, Agbinya JI,
Mariadason JM, Burgess AW, Mathivanan S
(2015) FunRich: an open access standalone
functional enrichment and interaction network
analysis tool. Proteomics 15(15):2597–2601.
doi:10.1002/pmic.201400515
3. Dennis G Jr, Sherman BT, Hosack DA, Yang J,
Gao W, Lane HC, Lempicki RA (2003)
DAVID: database for annotation, visualiza-
tion, and integrated discovery. Genome Biol
4(5):P3
4. Kanehisa M, Goto S (2000) KEGG: Kyoto
encyclopedia of genes and genomes. Nucleic
Acids Res 28(1):27–30
5. Nishimura D (2004) BioCarta. Biotech
Software Internet Report 2:117–120. doi:
10.1089/152791601750294344
6. Subramanian A, Tamayo P, Mootha VK,
Mukherjee S, Ebert BL, Gillette MA, Paulovich
A, Pomeroy SL, Golub TR, Lander ES,
Mesirov JP (2005) Gene set enrichment analy-
sis: a knowledge-based approach for
interpreting genome-wide expression profiles.
Proc Natl Acad Sci U S A 102(43):15545–
15550. doi:10.1073/pnas.0506580102
7. Zeeberg BR, Feng W, Wang G, Wang MD,
Fojo AT, Sunshine M, Narasimhan S, Kane
DW, Reinhold WC, Lababidi S, Bussey KJ,
Riss J, Barrett JC, Weinstein JN (2003)
GoMiner: a resource for biological interpreta-
tion of genomic and proteomic data. Genome
Biol 4(4):R28
8. Beissbarth T, Speed TP (2004) GOstat: find
statistically overrepresented gene ontologies
within a group of genes. Bioinformatics
20(9):1464–1465. doi:10.1093/bioinformat-
ics/bth088
9. Martin D, Brun C, Remy E, Mouren P,
Thieffry D, Jacq B (2004) GOToolBox:
functional analysis of gene datasets based on
Gene Ontology. Genome Biol 5(12):R101.
doi:10.1186/gb-2004-5-12-r101
10. Castillo-Davis CI, Hartl DL (2003) Gene
Merge—post-genomic analysis, data mining,
and hypothesis testing. Bioinformatics
19(7):891–892
11. Boyle EI, Weng S, Gollub J, Jin H, Botstein D,
Cherry JM, Sherlock G (2004) GO::Term
Finder—open source software for accessing
gene ontology information and finding signifi-
cantly enriched gene ontology terms associated
with a list of genes. Bioinformatics 20(18):3710–
3715. doi:10.1093/bioinformatics/bth456
12. Du Z, Zhou X, Ling Y, Zhang Z, Su Z (2010)
agriGO: a GO analysis toolkit for the agricul-
tural community. Nucleic Acids Res 38(Web
Server Issue):64–70. doi:10.1093/nar/gkq310
13. Al-Shahrour F, Minguez P, Tarraga J, Medina
I, Alloza E, Montaner D, Dopazo J (2007)
FatiGO +: a functional profiling tool for
genomic data. Integration of functional anno-
tation, regulatory motifs and interaction data
with microarray experiments. Nucleic Acids
Res 35(Web Server Issue):91–96. doi:10.1093/
nar/gkm260
14. Joshi-Tope G, Gillespie M, Vastrik I,
D'Eustachio P, Schmidt E, de Bono B, Jassal
B, Gopinath GR, Wu GR, Matthews L, Lewis
S, Birney E, Stein L (2005) Reactome: a
knowledgebase of biological pathways. Nucleic
Acids Res 33(Database issue):D428–D432.
doi:10.1093/nar/gki072
15. Hermjakob H, Montecchi-Palazzi L,
Lewington C, Mudali S, Kerrien S, Orchard S,
Vingron M, Roechert B, Roepstorff P, Valencia
A, Margalit H, Armstrong J, Bairoch A,
Cesareni G, Sherman D, Apweiler R (2004)
IntAct: an open source molecular interaction
database. Nucleic Acids Res 32(Database
issue):D452–D455. doi:10.1093/nar/gkh052
16. Kandasamy K, Mohan SS, Raju R,
Keerthikumar S, Kumar GS, Venugopal AK,
Telikicherla D, Navarro JD, Mathivanan S,
Pecquet C, Gollapudi SK, Tattikota SG,
Mohan S, Padhukasahasram H, Subbannayya
Y, Goel R, Jacob HK, Zhong J, Sekhar R,
Nanjappa V, Balakrishnan L, Subbaiah R,
Ramachandra YL, Rahiman BA, Prasad TS,
Lin JX, Houtman JC, Desiderio S, Renauld
JC, Constantinescu SN, Ohara O, Hirano T,
Kubo M, Singh S, Khatri P, Draghici S, Bader
GD, Sander C, Leonard WJ, Pandey A (2010)
NetPath: a public resource of curated signal
transduction pathways. Genome Biol
11(1):R3. doi:10.1186/gb-2010-11-1-r3
High Throughput Proteomic Analysis
160
17. Mi H, Poudel S, Muruganujan A, Casagrande
JT, Thomas PD (2016) PANTHER version 10:
expanded protein families and functions, and
analysis tools. Nucleic Acids Res 44(D1):D336–
D342. doi:10.1093/nar/gkv1194
18. von Mering C, Huynen M, Jaeggi D, Schmidt
S, Bork P, Snel B (2003) STRING: a database
of predicted functional associations between
proteins. Nucleic Acids Res 31(1):258–261
19. Zanzoni A, Montecchi-Palazzi L, Quondam
M, Ausiello G, Helmer-Citterich M, Cesareni
G (2002) MINT: a molecular INTeraction
database. FEBS Lett 513(1):135–140
20. Linding R, Jensen LJ, Pasculescu A, Olhovsky
M, Colwill K, Bork P, Yaffe MB, Pawson T
(2008) NetworKIN: a resource for exploring
cellular phosphorylation networks. Nucleic
Acids Res 36(Database issue):D695–D699.
doi:10.1093/nar/gkm902
21. Peri S, Navarro JD, Kristiansen TZ, Amanchy
R, Surendranath V, Muthusamy B, Gandhi
TK, Chandrika KN, Deshpande N, Suresh S,
Rashmi BP, Shanker K, Padma N, Niranjan V,
Harsha HC, Talreja N, Vrushabendra BM,
Ramya MA, Yatish AJ, Joy M, Shivashankar
HN, Kavitha MP, Menezes M, Choudhur y
DR, Ghosh N, Saravana R, Chandran S,
Mohan S, Jonnalagadda CK, Prasad CK,
Kumar-Sinha C, Deshpande KS, Pandey A
(2004) Human protein reference database as a
discovery resource for proteomics. Nucleic
Acids Res 32(Database issue):D497–D501.
doi:10.1093/nar/gkh070
22. Diella F, Cameron S, Gemund C, Linding R,
Via A, Kuster B, Sicheritz-Ponten T, Blom N,
Gibson TJ (2004) Phospho.ELM: a database of
experimentally verified phosphorylation sites in
eukaryotic proteins. BMC Bioinformatics 5:79.
doi:10.1186/1471-2105-5-79
23. Garavelli JS (2004) The RESID database of
protein modifications as a resource and anno-
tation tool. Proteomics 4(6):1527–1533.
doi:10.1002/pmic.200300777
24. Hornbeck PV, Zhang B, Murray B, Kornhauser
JM, Latham V, Skrzypek E (2015)
PhosphoSitePlus, 2014: mutations, PTMs and
recalibrations. Nucleic Acids Res 43(Database
issue):D512–D520. doi:10.1093/nar/gku1267
25. Gupta R, Birch H, Rapacki K, Brunak S,
Hansen JE (1999) O-GLYCBASE version 4.0:
a revised database of O-glycosylated proteins.
Nucleic Acids Res 27(1):370–372
26. Creasy DM, Cottrell JS (2004) Unimod: pro-
tein modifications for mass spectrometry.
Proteomics 4(6):1534–1536. doi:10.1002/
pmic.200300744
27. Gnad F, Ren S, Cox J, Olsen JV, Macek B,
Oroshi M, Mann M (2007) PHOSIDA (phos-
phorylation site database): management, struc-
tural and evolutionary investigation, and
prediction of phosphosites. Genome Biol
8(11):R250. doi:10.1186/gb-2007-8-11-r250
28. Huang HD, Lee TY, Tzeng SW, Horng JT
(2005) KinasePhos: a web tool for identifying
protein kinase-specific phosphorylation sites.
Nucleic Acids Res 33(Web Server Issue):226–
229. doi:10.1093/nar/gki471
29. Blom N, Gammeltoft S, Brunak S (1999)
Sequence and structure-based prediction of
eukaryotic protein phosphorylation sites.
J Mol Biol 294(5):1351–1362. doi:10.1006/
jmbi.1999.3310
30. Iakoucheva LM, Radivojac P, Brown CJ,
O'Connor TR, Sikes JG, Obradovic Z, Dunker
AK (2004) The importance of intrinsic disorder
for protein phosphorylation. Nucleic Acids Res
32(3):1037–1049. doi:10.1093/nar/gkh253
31. Kiemer L, Bendtsen JD, Blom N (2005)
NetAcet: prediction of N-terminal acetylation
sites. Bioinformatics 21(7):1269–1270.
doi:10.1093/bioinformatics/bti130
32. Lee TY, Hsu JB, Lin FM, Chang WC, Hsu PC,
Huang HD (2010) N-Ace: using solvent acces-
sibility and physicochemical properties to identify
protein N-acetylation sites. J Comput Chem
31(15):2759–2771. doi:10.1002/jcc.21569
33. Suo SB, Qiu JD, Shi SP, Sun XY, Huang SY,
Chen X, Liang RP (2012) Position-specific anal-
ysis and prediction for protein lysine acetylation
based on multiple features. PLoS One 7(11),
e49108. doi:10.1371/journal.pone.0049108
34. Li Y, Wang M, Wang H, Tan H, Zhang Z,
Webb GI, Song J (2014) Accurate in silico
identification of species-specific acetylation
sites by integrating protein sequence-derived
and functional features. Sci Rep 4:5765.
doi:10.1038/srep05765
35. Radivojac P, Vacic V, Haynes C, Cocklin RR,
Mohan A, Heyen JW, Goebl MG, Iakoucheva
LM (2010) Identification, analysis, and predic-
tion of protein ubiquitination sites. Proteins
78(2):365–380. doi:10.1002/prot.22555
36. Tung CW, Ho SY (2008) Computational
identification of ubiquitylation sites from pro-
tein sequences. BMC Bioinformatics 9:310.
doi: 10.1186/1471-2105-9-310
37. Lee H, Yi GS, Park JC (2008) E3Miner: a text
mining tool for ubiquitin-protein ligases.
Nucleic Acids Res 36(Web Server Issue):416–
422. doi:10.1093/nar/gkn286
38. Chen Z, Zhou Y, Song J, Zhang Z (2013)
hCKSAAP_UbSite: improved prediction of
human ubiquitination sites by exploiting amino
acid pattern and properties. Biochim Biophys
Acta 1834(8):1461–1467. doi:10.1016/j.
bbapap.2013.04.006
Krishna Patel et al.
161
39. Qiu WR, Xiao X, Lin WZ, Chou KC (2015)
iUbiq-Lys: prediction of lysine ubiquitination
sites in proteins by extracting sequence evolu-
tion information via a gray system model.
J Biomol Struct Dyn 33(8):1731–1742. doi:1
0.1080/07391102.2014.968875
40. Du Y, Xu N, Lu M, Li T (2011) hUbiquitome:
a database of experimentally verified ubiquiti-
nation cascades in humans. Database (Oxford)
2011:bar055. doi:10.1093/database/bar055
41. Eifler K, Vertegaal AC (2015) Mapping the
SUMOylated landscape. FEBS J 282(19):3669–
3680. doi:10.1111/febs.13378
42. Xue Y, Zhou F, Fu C, Xu Y, Yao X (2006)
SUMOsp: a web server for sumoylation site
prediction. Nucleic Acids Res 34(Web Server
Issue):254–257. doi:10.1093/nar/gkl207
43. Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu
W, Liu Z, Zhao Y, Xue Y, Ren J (2014) GPS-
SUMO: a tool for the prediction of sumoylation
sites and SUMO-interaction motifs. Nucleic
Acids Res 42(Web Server Issue):325–330.
doi:10.1093/nar/gku383
44. Caragea C, Sinapov J, Silvescu A, Dobbs D,
Honavar V (2007) Glycosylation site predic-
tion using ensembles of Support Vector
Machine classifiers. BMC Bioinformatics
8:438. doi:10.1186/1471-2105-8-438
45. Julenius K (2007) NetCGlyc 1.0: prediction of
mammalian C-mannosylation sites. Glycobiology
17(8):868–876. doi:10.1093/glycob/cwm050
46. Hansen JE, Lund O, Tolstrup N, Gooley AA,
Williams KL, Brunak S (1998) NetOglyc: pre-
diction of mucin type O-glycosylation sites
based on sequence context and surface acces-
sibility. Glycoconj J 15(2):115–130
47. Gupta R, Jung E, Brunak S (2004)
NetNGlyc 1.0 Server. Center for biological
sequence analysis, technical university of
Denmark (http://wwwcbsdtudk/services/
NetNGlyc)
48. Pierleoni A, Martelli PL, Casadio R (2008)
PredGPI: a GPI-anchor predictor. BMC
Bioinformatics 9:392. doi:10.1186/
1471-2105-9-392
49. Fankhauser N, Maser P (2005) Identification
of GPI anchor attachment signals by a
Kohonen self-organizing map. Bioinformatics
21(9):1846–1852. doi:10.1093/bioinformat-
ics/bti299
50. Obenauer JC, Cantley LC, Yaffe MB
(2003) Scansite 2.0: proteome-wide predic-
tion of cell signaling interactions using short
sequence motifs. Nucleic Acids Res 31(13):
3635–3641
51. Chou MF, Schwartz D (2011) Biological
sequence motif discovery using motif-x. Curr
Protoc Bioinformatics 13:15–24. doi:10.1002/
0471250953.bi1315s35
52. Hummel J, Niemann M, Wienkoop S, Schulze
W, Steinhauser D, Selbig J, Walther D,
Weckwerth W (2007) ProMEX: a mass spec-
tral reference database for proteins and protein
phosphorylation sites. BMC Bioinformatics
8:216. doi:10.1186/1471-2105-8-216
53. Dahlquist KD, Salomonis N, Vranizan K,
Lawlor SC, Conklin BR (2002) GenMAPP, a
new tool for viewing and analyzing microarray
data on biological pathways. Nat Genet
31(1):19–20. doi:10.1038/ng0502-19
54. Shannon P, Markiel A, Ozier O, Baliga
NS, Wang JT, Ramage D, Amin N, Schwikowski
B, Ideker T (2003) Cytoscape: a software envi-
ronment for integrated models of biomolecular
interaction networks. Genome Res
13(11):2498–2504. doi:10.1101/gr.1239303
55. Hooper SD, Bork P (2005) Medusa: a simple
tool for interaction graph analysis. Bioinformatics
21(24):4432–4433. doi:10.1093/bioinformat-
ics/bti696
56. Tyanova S, Temu T, Sinitcyn P, Carlson A,
Hein M, Geiger T, Mann M and Cox J
(2016) The Perseus computational platform
for comprehensive analysis of (prote)omics
data. Nature Methods 3(9):731–740. doi:
10.1038/nmeth.3901
High Throughput Proteomic Analysis