ChapterPDF Available

Bioinformatics Methods to Deduce Biological Interpretation from Proteomics Data

December 2016
Methods in molecular biology (Clifton, N.J.) 1549:147-161

December 2016
1549:147-161

DOI:10.1007/978-1-4939-6740-7_12

In book: Proteome Bioinformatics (pp.147-161)
Chapter: Bioinformatics Methods to Deduce Biological Interpretation from Proteomics Data
Publisher: Springer New York
Editors: Shivakumar Keerthikumar, Suresh Mathivanan

Authors:

Krishna Patel

Nference

Manika Singh

Harsha Gowda

Queensland Institute of Medical Research

High-throughput proteomics studies generate large amounts of data. Biological interpretation of these large scale datasets is often challenging. Over the years, several computational tools have been developed to facilitate meaningful interpretation of large-scale proteomics data. In this chapter, we describe various analyses that can be performed and bioinformatics tools and resources that enable users to do the analyses. Many Web-based and stand-alone tools are relatively user-friendly and can be used by most biologists without significant assistance.

A general framework and outline of various bioinformatics analyses approaches that can be used for high-throughput proteomic data

…

List of pathway resources and network analysis tools

…

List of pathway analysis and visualization tools

…

Figures - uploaded by Krishna Patel

Content may be subject to copyright.

Content uploaded by Krishna Patel

Content may be subject to copyright.

147

Shivakumar Keerthikumar and Suresh Mathivanan (eds.), Proteome Bioinformatics, Methods in Molecular Biology,

vol. 1549, DOI 10.1007/978-1-4939-6740-7_12, © Springer Science+Business Media LLC 2017

Chapter 12

Bioinformatics Methods to Deduce Biological

Interpretation from Proteomics Data

Krishna Patel, Manika Singh, and Harsha Gowda

Abstract

High-throughput proteomics studies generate large amounts of data. Biological interpretation of these

large scale datasets is often challenging. Over the years, several computational tools have been developed

to facilitate meaningful interpretation of large-scale proteomics data. In this chapter, we describe various

analyses that can be performed and bioinformatics tools and resources that enable users to do the analyses.

Many Web-based and stand-alone tools are relatively user-friendly and can be used by most biologists

without signiﬁcant assistance.

Key words Gene ontology, FunRich, Reactome, NetPath, Phosphoproteome, Pathways, Enrichment,

Post-translational modiﬁcations

1 Introduction

High-throughput proteomics studies result in identiﬁcation and

quantitation of thousands of proteins in a biological specimen.

These studies are often carried out to determine dynamic changes

in proteins including differential expression pattern between bio-

logical conditions, activation of speciﬁc signaling pathways and in

protein complexes. To achieve these, mass spectrometry based

methods are often employed to measure relative abundance of pro-

teins or post-translational modiﬁcations including phosphoryla-

tion, acetylation, glycosylation, and ubiquitination. Although such

large-scale studies generate enormous amount of data, they pose

signiﬁcant challenge for biologists for biological interpretation.

Several commercial and open source tools have been devel-

oped over the years to facilitate biological interpretation of pro-

teomics data. These tools allow biologists to disentangle complexity

in large datasets and identify meaningful patterns. Most biological

processes are not driven by a single protein but many proteins act-

ing in concert. If any two biological conditions or cell phenotypes

were compared using quantitative proteomics, one could expect a

148

set of proteins that regulate these two distinct cell phenotypes or

biological conditions to be differentially expressed. Tools that are

developed to carry out gene set enrichment or overrepresentation

analysis enable identiﬁcation of such patterns from large scale data-

sets. Such enrichment analysis can also facilitate functional annota-

tion of orphan molecules based on their association with other

well-characterized molecules. Here, we describe several tools that

can be used for such analysis in mammalian system, particularly

those that have well-annotated data including human.

2 Materials

Several commercial as well as open source tools are available for

carrying out bioinformatics analysis of high throughput datasets.

For each type of analysis, we are providing list of tools that can be

used in relevant sections of the chapter. A step-by-step instruction

is also provided for one tool in each section. General outline of the

workﬂow and different kinds of analysis that can be carried out is

provided in Fig. 1.

3 Methods

Gene ontology (GO) consortium has developed controlled vocab-

ulary to represent biological functions, processes, and cellular

localization information [1]. The terms are linked to correspond-

ing genes based on our understanding of gene function and local-

ization. This data is extensively used to carry out GO enrichment

analysis that provides insights into biological functions/processes

enriched in a large scale proteomics dataset. There are several tools

that have been developed to carry out enrichment analysis provid-

ing gene/protein list as an input. FunRich [2] is a user friendly

stand-alone tool for GO enrichment analysis. The tool allows users

to upload or paste gene symbols, gene ID, Uniprot ID, and RefSeq

protein ID as input for the analysis. Results of the enrichment anal-

ysis are produced in various graphical formats such as bar graph,

pie chart, Venn diagram, heat map, and doughnut chart. Multiple

gene sets can be uploaded for comparative analysis of GO enrich-

ment and pathway enrichment analysis. The tool provides various

graphical representation options for visualizing comparative results.

One of the widely used Web-based tools is DAVID (Database

for Annotation, Visualization, and Integrated Discovery (https://

david.ncifcrf.gov/) [3]. It provides a comprehensive set of func-

tional annotation tools which can not only identify enriched bio-

logical themes, particularly GO terms, but also discover functionally

related enriched gene groups based on popular pathway databases

including KEGG [4] and BioCarta [5]. Here we describe a step-

by- step guide for GO enrichment using DAVID.

3.1 Gene Ontology

Enrichment Analysis

Krishna Patel et al.

149

Fig. 1 A general framework and outline of various bioinformatics analyses approaches that can be used for high-throughput proteomic data

High Throughput Proteomic Analysis

150

There are two major DAVID tools that could be used for

functional annotation/classiﬁcation of gene lists—Functional Anno-

tation and Gene Functional Classiﬁcation. The tools can be accessed

by clicking the links on top left corner of the home page.

1. To begin the analysis, click on “Functional annotation”.

2. The resulting Web page shows three tabs—Upload, List, and

Background.

3. In the “Upload” tab, either paste gene list into the box or

browse and upload the list where there is a single column with

each row representing a single gene (see Note 1).

4. The ‘list’ tab in DAVID allows users to limit gene annotations

to one or more species. The default parameter chooses Homo

sapiens.

5. For enrichment analysis, user has to choose a background

using ‘Background’ tab. Default background in DAVID is

Homo sapiens whole genome background. The user can choose

to use a custom background.

6. DAVID recognizes gene lists with various identiﬁers including

ofﬁcial gene symbols and accession numbers. For proteomics

datasets, it is best to use ofﬁcial gene symbols in gene lists and

choose that as an identiﬁer in step 2 in ‘Upload’ tab.

7. In step 3, choose if the list you uploaded should be used as

‘Gene List’ or ‘Background’. For data from human samples,

choose your input as ‘Gene List’ as Homo sapiens whole

genome background is used as a default.

8. Click ‘Submit List’ button. The results provided by DAVID

include ‘Functional Annotation Clustering’, ‘Functional

Annotation Chart’ and ‘Functional Annotation Table’. These

results provide a quick glance of major biological functions

enriched in the gene list.

9. For GO enrichment analysis, click on Gene_Ontology and

select GOTERM_BP_ALL for biological process, GOTERM_

CC_ALL for subcellular localization, and GOTERM_MF_

ALL for molecular function as background for the GO

enrichment analysis. Click on “Functional annotation cluster-

ing” and DAVID will generate clusters of terms with similar

biological meaning based on shared/similar gene members.

The signiﬁcance of this enrichment is also calculated based on

modiﬁed Fisher Exact P-value.

10. Top panel of the result window is parameter panel which user can

modify according to need and rerun the process without submit-

ting input again. It is recommended to select higher stringency

for small, concise and meaningful clusters rather than broader

and vague cluster of proteins. Default setting is medium strin-

gency however user can modify this option based on the analysis.

Krishna Patel et al.

151

Higher enrichment score indicates that annotation term members

are overrepresented in uploaded input.

11. Result table displays annotation categories, enriched functional

annotation, enrichment scores of each cluster, number of genes

contributing to clustering of similar GO terms, and modiﬁed

Fisher Exact P-value.

12. To analyze the most enriched clusters, user can sieve out clus-

ters with maximum enrichment score and lesser P-value for

biological process, molecular function and subcellular localiza-

tion (see Note 2).

13. A link to ‘G’ on top of each cluster could be used to extract

deﬁned set of proteins contributing to enrichment of the given

cluster and matrix icon draws heat map for the small cluster

and provides the GO term count matrix for each protein which

can be further used for plotting graphs.

14. User can also employ pathway and functional domain enrich-

ment analysis using DAVID by selecting “Pathway”, “Functional

categories” and “Protein domains” as backend reference data-

base for functional annotation. However, a user-friendly graphi-

cal user interface for pathways analysis study is deployed by

Web-resource Reactome which is explained in detail below.

Table 1 enlists other widely used open source gene set enrich-

ment analysis tools.

Proteins regulate most cellular processes. Several proteins work in

concert to regulate these processes and are often grouped into spe-

ciﬁc pathways in which they carry out their functions. Over the

years, pathways and processes that are regulated by speciﬁc pro-

teins have been systematically annotated. Based on protein expres-

sion data, it is possible to arrive at pathways and processes that are

active in a biological sample. In addition to expression, some of the

most widely studied signaling pathway mechanisms include

dynamic interplay of kinases and phosphatases that results in addi-

tion or removal of phosphorylation on proteins. Differential pro-

tein expression data or phosphoproteomics data can be utilized to

carry out pathway enrichment analysis. If expression or phosphor-

ylation levels of certain proteins are changing in a biological sample

as compared to their pattern in an appropriate control, it is possible

to predict potential pathways that are differentially regulated.

Reactome [14] is manually curated open access Web-based resource

of biological pathways which allows users to browse, search and

map proteins onto pathways. It also provides list of interactors

acquired from IntAct [15] molecular interaction database with

nodes of pathways.

Here we describe Reactome, a Web-based tool that can be

used for pathway analysis.

3.2 Pathway

Analysis

High Throughput Proteomic Analysis

152

Table 1

List of tools that can be used for gene ontology and gene set enrichment analysis

Name Description Link Reference

GSEA Gene set enrichment analysis

(GSEA) is an expression analytics

tool. It compares gene set

enrichment between conditions

and provides enriched set of

genes with their statistical

signiﬁcance scores to interpret

biological data

Stand-alone http://www.

broadinstitute.org/gsea/

[6]

FunRich FunRich is a downloadable tool for

pathways and GO enrichment

analysis of genes and proteins. It

can process genes/proteins

irrespective of source of the

sample as user can load

customized database along with

default available background

database

Stand-alone http://funrich.org/ [2]

GoMiner GoMiner leverages Gene Ontology

by providing a framework to

visualize and integrate “omics”

data. It makes cluster of genes

and their expression proﬁles

which can be analyzed for their

biological signiﬁcance. Each

gene is linked to BioCarta, Entez

Genome, NCBI structures,

Pubmed and MedMiner for

greater clarity

Stand-alone, Web http://

discover.nci.nih.gov/gominer

[7]

GOstat GOstat tool uses GO terms

database to ﬁnd statistically over

represented genes from the data

set. The results list out

signiﬁcant set of genes for

biological interpretation

Web http://gostat.wehi.edu.au [8]

GOToolBox GOToolBox is used for functional

annotation of genes. GOtoolBox

is a perl based program which

can be automated in any gene

expression analysis pipeline.

GOToolBox also has GO-Diet

and PRODISTIN framework

which can be used to study

protein–protein interactions

Web http://genome.crg.es/

GOToolBox/

[9]

(continued)

Krishna Patel et al.

153

1. Reactome (http://www.reactome.org/) allows mapping the

list of proteins on pathways and carry out enrichment analysis

to determine if the input data contains overrepresentation of

proteins involved in certain pathways (see Note 3).

2. Click on “Analyze Data”. It is a three-step process that begins

with pasting the protein list with appropriate header on the

Web page. The tool also takes accession numbers and other

identiﬁers as an input. In the next step, it allows projection of

data on to human annotation if it comes from a different spe-

cies and also to include interactors from IntAct Molecular

Interaction database. After making appropriate selection, click

on analyze.

3. The resulting page is divided into four panels. ‘Hierarchy

panel’ on the left part of the Web page lists enriched pathways

with corresponding FDR, ‘Viewport’ panel shows graphical

representation of an overview of these pathways with various

options to navigate, top panel provides conﬁguration options

and a bottom panel provides details of objects selected in the

pathway diagram. A detailed manual to understand and navi-

gate this pathway analysis tool can be found at http://wiki.

reactome.org/index.php/Usersguide.

Table 1

(continued)

Name Description Link Reference

GeneMerge GeneMerge enables over-

representation analysis of gene

attributes in a given set of genes

as compared to genome

background

Stand-alone, Web http://www.

genemerge.net/

[10]

GO:TermFinder GO:TermFinder is a tool that helps

to ﬁnd signiﬁcant GO terms

shared among a list of genes. It

has GO:TermFinder libraries

that enables visualization of

results

Stand-alone http://search.cpan.

org/dist/GO-TermFinder/

[11]

agriGO agriGO is a specialized data

analytics tool for the agricultural

community. The database has 38

agricultural species comprising of

274 data types

Web http://bioinfo.cau.edu.cn/

agriGO/

[12]

FatiGO FatiGO helps to ﬁnd signiﬁcant

over- representation of functional

annotations in one gene set

compared to the other

Web http://babelomics.bioinfo.

cipf.es

[13]

High Throughput Proteomic Analysis

154

There are various commercial tools such as QIAGEN Ingenuity

Pathway Analysis (IPA) and Agilent Genomics Genespring for

functional and pathway enrichment analysis. Table 2 lists some of

the widely used pathway resources and network analysis tools.

Post-translational modiﬁcations (PTM) play an important role in

regulating various cellular processes. One of the most widely stud-

ied PTM is phosphorylation. It acts as a switch for activation and

deactivation of speciﬁc proteins and associated signaling pathways.

This modiﬁcation serves as a rapid and reversible means to modu-

late protein activity and transduce signals. Advent of mass spec-

trometry has revolutionized our ability to map PTMs. These

studies have provided a comprehensive view of proteins that

undergo modiﬁcations along with speciﬁc sites. Based on our

understanding of enzyme–substrate relationships and speciﬁc

motifs that are targeted for post-translational modiﬁcations, a

number of computational tools have been developed to predict

PTMs. These tools can be utilized to evaluate the validity of identi-

ﬁed sites in large scale studies (based on known sites in the data-

base) or predict potential modiﬁcations.

Human Protein Reference Database (HPRD) [21] is a reposi-

tory of manually curated PTM sites. Phospho.ELM [22] is a

resource of experimentally validated phosphorylation sites that are

manually curated from the literature. The RESID [23] database

provides PTM information with literature citation, protein feature

table, molecular models, structure diagrams and Gene Ontology

cross reference. PhosphoSitePlus [24] is a comprehensive reposi-

tory of curated phosphosites containing reference and orthologous

residues in other species. O-GLYCBASE [25] is a resource con-

taining experimentally veriﬁed O-linked glycosylation sites.

Unimod [26] is a comprehensive public domain database of pro-

tein modiﬁcations for mass spectrometry application.

Most extensively studied PTM is phosphorylation. Protein

kinases add phosphate moieties to Tyr, Ser, or Thr residues. Mass

spectrometry is being extensively used to investigate protein phos-

phorylation in a high-throughput manner. Phosphorylation either

increases or decreases the activity of target protein. Overlaying phos-

phoproteomic data on curated pathways can provide insights into

activation or deactivation of a particular signaling pathway.

PhosphositePlus [24] and PHOSIDA [27] are comprehensive repos-

itories of curated phosphosites containing reference and orthologous

residues in other species. Protein sequences can be analyzed using

various prediction tools for identifying phosphosites such as

KinasePhos 2.0 [28], NetPhos 2.0 [29], and DISPHOS 1.3 [30].

Several computational approaches have been developed to pre-

dict acetylation sites. NetAcet [31] is a neural network based

N-terminal acetylation site prediction tool, N-Ace [32] predicts

acetylation sites based on physicochemical properties of protein with

accessible surface area, PSKAcePred [33] is an approach that uses

3.3 Post-

translational

Modiﬁcation Analysis

Krishna Patel et al.

155

Table 2

List of pathway resources and network analysis tools

Name Description Link Reference

NetPath NetPath is a manually curated resource

of signal transduction pathways.

Pathway data can be browsed,

visualized or downloaded in PSI-MI,

BioPAX and SBML formats. These

standard formats enable visualization

using external tools like Cytoscape

Web www.netpath.org [16]

PANTHER Protein ANalysis THrough

Evolutionary Relationships

(PANTHER) is an analysis

framework with multiple tools for

evolutionary and functional

classiﬁcation of proteins. Panther

pathway resource allows visualization

of protein expression data in the

context of pathway diagrams

Web http://www.pantherdb.org/

pathway

[17]

KEGG Kyoto encyclopedia of genes and

genomes (KEGG) is an integrated

database resource. Pathway maps

and annotation in KEGG is widely

used for pathway enrichment analysis

Web http://www.genome.jp/

kegg/

[4]

STRING Search Tool for the Retrieval of

Interacting Genes/Proteins

(STRING) is a database of protein–

protein interactions

Web http://string-db.org/ [18]

FunRich FunRich is a downloadable tool for

pathways and GO enrichment

analysis of genes and proteins. It can

process genes/proteins irrespective

of source of the sample as user can

load customized database along with

default available background

database

Stand-alone http://funrich.org/ [2]

MINT MINT: Molecular INTeraction is a

curated molecular interaction

database

Web, stand-alone http://mint.

bio.uniroma2.it/mint/

Welcome.do

[19]

NetworKIN NetworKIN database provides

interface to analyze cellular

phosphorylation networks. It allows

users to query precomputed

kinase–substrate relations or obtain

predictions on novel

phosphoproteins

Web, stand-alone http://

networkin.info

[20]

High Throughput Proteomic Analysis

156

evolutionary similarity along with physicochemical properties to

predict lysine acetylation sites and Species Speciﬁc Prediction of

Lysine Acetylation (SSPKA) [34] is a computational framework that

incorporates predicted secondary structure information, and com-

bines functional features and sequence feature to predict species-

speciﬁc acetylation sites across six different species—H. sapiens, R.

norvegicus, M. musculus, E. coli, S. typhimurium and S. cerevisiae.

Ubiquitination is one of the most difﬁcult PTMs to be identi-

ﬁed due to its low abundance, size, and dynamic regulation. Due

to larger size of ubiquitin compared to other PTMs, it is difﬁcult to

capture by mass spectrometry. However, several ubiquitination

sites have been mapped in the last few years based on diglycine-

modiﬁed lysine tag can be identiﬁed by mass spectrometry. Several

tools including UbPred [35], UbiPred [36], E3Miner [37], hCK-

SAAP_UbSite [38], and iUbiq-Lys [39] have been developed over

the years for prediction of ubiquitination sites. hUbiquitome [40]

is a comprehensive repository of experimentally veriﬁed human

ubiquitination enzymes and substrates.

Small ubiquitin-like modiﬁer (SUMO) attaches to various tar-

get proteins and modulates cellular processes such as DNA replica-

tion, transcription, cell division, nuclear trafﬁcking, and DNA

damage response. SUMOylation affects half-life, localization of

targets or binding partners and is a crucial mechanism that allows

cells to adapt to stress stimuli. Identiﬁcation of SUMO sites has

enabled us to identify strong dependency of SUMOylation events

on other PTMs [41]. SUMOsp [42] and GPS-SUMO [43] pre-

dicts SUMO sites on proteins.

Glycosylation is a common PTM that plays a crucial role in

protein folding, cell–cell interaction, antigenicity, transport, and

half-life. There are four types of glycosylation: N-linked, O-linked,

C-mannosylation, and GPI anchor attachment. EnsembleGly [44]

predicts both O- and N-linked glycosylation sites, NetCGlyc [45]

predicts C-mannosylation, NetOGlyc [46] predicts O-glycosylation

sites, and NetNGlyc [47] predicts N-Glycosylation sites; PredGPI

[48] and GPI-SOM [49] predict GPI anchor sites in a protein.

Scansite [50] is a tool to analyze protein sequence for phos-

phorylation motifs recognized by many kinases and Motif-X [51]

allows prediction of various PTM site motifs by identifying over-

represented residues in the ﬂanking regions. ProMEX [52] is a

database of mass spectra of tryptic peptides from plant proteins and

phosphoproteins.

Here we describe PTM analysis using commonly used PTM

database Phospho.ELM [22] and phosphorylation PTM site pre-

dictor NetPhos 2.0 [29].

1. To identify experimentally validated PTMs of a given protein,

browse Phospho.ELM database (http://phospho.elm.eu.

org/index.html). Database can be queried using protein name,

UniPROT accession, and Ensembl identiﬁer.

Krishna Patel et al.

157

2. Result page of Phospho.ELM database consists of table detailing

residue, position of residue in proteins, ﬂanking sequence with

PTM site, kinase, PubMed reference for each site reported,

conservation score, cross-reference to eukaryotic linear motif

resource (ELM: http://elm.eu.org/), phospho-peptide bind-

ing domain, SMART domains, and cross-reference to PDB

link along with other information such as substrate, cross-ref-

erence to PHOSIDA [27], PhosphositePlus [24], MINT [19],

and GO-Terms [1].

3. Computational prediction of phosphorylation can be done

using NetPhos 2.0 server (http://www.cbs.dtu.dk/services/

NetPhos/). Users can submit protein sequence in FASTA for-

mat and select target residue for phosphorylation (tyrosine,

serine, or threonine). By default, all three residues are checked

in the analysis. Select checkbox if users wish to generate graph-

ical output.

4. Click on “Submit” to initiate analysis. In a single query, up to

2000 protein sequences can be analyzed by this Web-based

tool.

5. Result page will display table detailing submitted protein ID,

residue position, PTM site with ﬂanking sequences and score.

Three tables are separately generated for serine, threonine, and

tyrosine.

6. A graphical result depicts propensity of a residue on a given

position as PTM site. Three different color peaks are used for

each residue (S,T,Y) on an X-Y plane where X-axis is sequence

position and Y-axis is phosphorylation potential.

A multitude of tools are available for data integration and visualiza-

tion of “omics” data-sets (Table 3). Most visualization tools focus

on biomolecular interactions and pathways. These tools commonly

employ 2D graphs for data representation. The basic efﬁciency of

these tools lies in its compatibility with other tools and databases.

4 Notes

1. It is preferable to use ‘Gene Symbol’ as unique identiﬁer for

genes. DAVID has ID conversion tool that can be used to pre-

pare the lists with uniform identiﬁers.

2. Enrichment analysis methods often involve statistical tests to

determine if input data contains overrepresentation of proteins

involved in certain functions, processes, or pathways more than

what is expected by chance. This is calculated with respect to

the background database used by respective tools. Many tools

also provide ﬂexibility for users by providing the option of

using custom database as background. Knowledge of statistical

3.4 Visualization

Tools

High Throughput Proteomic Analysis

158

approach employed in such tools would allow user to make

relevant selections for different kind of datasets to identify most

enriched genes/proteins cluster.

3. Pathway enrichment analysis is done using the pathway data-

base used in the background. Back end pathway database used

for analysis will directly inﬂuence the outcomes of the pathway

analysis. This aspect should be taken into consideration and

users should select appropriate pathway annotation resource

most suitable for intended pathway analysis.

Table 3

List of pathway analysis and visualization tools

Name Description Link Reference

GenMAPP GenMAPP is a Web-based visualization

tool for gene/protein expression

proﬁles. It has MAPPBuilder tool for

creating MAPP ﬁle (.mapp) which

creates graphical pathway representation

of genes and MAPPFinder tool to

annotate the pathway. Each gene is

identiﬁed by unique geneID from

Genbank. MAPP ﬁles can be shared and

manipulated by the user

Stand-alone http://www.

genmapp.org

[53]

CytoScape Cytoscape is Java-based stand-alone tool

which supports large scale network

analysis. Both protein–protein and

protein–gene networks can be visualized

and edited. The standard ﬁle format of

Cytoscape is Cytoscape Session File (.

cys). Input ﬁle in Cytoscape can be

delimited text table or excel workbook

though it supports all major input

formats. The result can be exported in

any of the formats like SIF, GML,

XGMML, and PSI-MI formats

Stand-alone http://www.

cytoscape.org/

[54]

Medusa Medusa is Java application for visualization

of complex pathways. Result from

STRING pathway database can be

analyzed in Medusa. Medusa is less

suited for big datasets

Stand-alone https://sites.

google.com/site/

medusa3visualization/

[55]

Perseus Perseus is a statistical analysis visualization

tool for proteomics data. It has

incorporated multiple statistical methods

like t-test, clustering, enrichment analysis

including normalization of data. It

provides various graphs for visualization

of data like scatter plot and volcano plot

Stand-alone http://www.

biochem.mpg.

de/5111810/perseus

[56]

Krishna Patel et al.

159

References

1. Ashburner M, Ball CA, Blake JA, Botstein D,

Butler H, Cherry JM, Davis AP, Dolinski K,

Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-

Tarver L, Kasarskis A, Lewis S, Matese JC,

Richardson JE, Ringwald M, Rubin GM, Sherlock

G (2000) Gene ontology: tool for the uniﬁcation

of biology. The gene ontology consortium. Nat

Genet 25(1):25–29. doi:10.1038/75556

2. Pathan M, Keerthikumar S, Ang CS, Gangoda

L, Quek CY, Williamson NA, Mouradov D,

Sieber OM, Simpson RJ, Salim A, Bacic A, Hill

AF, Stroud DA, Ryan MT, Agbinya JI,

Mariadason JM, Burgess AW, Mathivanan S

(2015) FunRich: an open access standalone

functional enrichment and interaction network

analysis tool. Proteomics 15(15):2597–2601.

doi:10.1002/pmic.201400515

3. Dennis G Jr, Sherman BT, Hosack DA, Yang J,

Gao W, Lane HC, Lempicki RA (2003)

DAVID: database for annotation, visualiza-

tion, and integrated discovery. Genome Biol

4(5):P3

4. Kanehisa M, Goto S (2000) KEGG: Kyoto

encyclopedia of genes and genomes. Nucleic

Acids Res 28(1):27–30

5. Nishimura D (2004) BioCarta. Biotech

Software Internet Report 2:117–120. doi:

10.1089/152791601750294344

6. Subramanian A, Tamayo P, Mootha VK,

Mukherjee S, Ebert BL, Gillette MA, Paulovich

A, Pomeroy SL, Golub TR, Lander ES,

Mesirov JP (2005) Gene set enrichment analy-

sis: a knowledge-based approach for

interpreting genome-wide expression proﬁles.

Proc Natl Acad Sci U S A 102(43):15545–

15550. doi:10.1073/pnas.0506580102

7. Zeeberg BR, Feng W, Wang G, Wang MD,

Fojo AT, Sunshine M, Narasimhan S, Kane

DW, Reinhold WC, Lababidi S, Bussey KJ,

Riss J, Barrett JC, Weinstein JN (2003)

GoMiner: a resource for biological interpreta-

tion of genomic and proteomic data. Genome

Biol 4(4):R28

8. Beissbarth T, Speed TP (2004) GOstat: ﬁnd

statistically overrepresented gene ontologies

within a group of genes. Bioinformatics

20(9):1464–1465. doi:10.1093/bioinformat-

ics/bth088

9. Martin D, Brun C, Remy E, Mouren P,

Thieffry D, Jacq B (2004) GOToolBox:

functional analysis of gene datasets based on

Gene Ontology. Genome Biol 5(12):R101.

doi:10.1186/gb-2004-5-12-r101

10. Castillo-Davis CI, Hartl DL (2003) Gene

Merge—post-genomic analysis, data mining,

and hypothesis testing. Bioinformatics

19(7):891–892

11. Boyle EI, Weng S, Gollub J, Jin H, Botstein D,

Cherry JM, Sherlock G (2004) GO::Term

Finder—open source software for accessing

gene ontology information and ﬁnding signiﬁ-

cantly enriched gene ontology terms associated

with a list of genes. Bioinformatics 20(18):3710–

3715. doi:10.1093/bioinformatics/bth456

12. Du Z, Zhou X, Ling Y, Zhang Z, Su Z (2010)

agriGO: a GO analysis toolkit for the agricul-

tural community. Nucleic Acids Res 38(Web

Server Issue):64–70. doi:10.1093/nar/gkq310

13. Al-Shahrour F, Minguez P, Tarraga J, Medina

I, Alloza E, Montaner D, Dopazo J (2007)

FatiGO +: a functional proﬁling tool for

genomic data. Integration of functional anno-

tation, regulatory motifs and interaction data

with microarray experiments. Nucleic Acids

Res 35(Web Server Issue):91–96. doi:10.1093/

nar/gkm260

14. Joshi-Tope G, Gillespie M, Vastrik I,

D'Eustachio P, Schmidt E, de Bono B, Jassal

B, Gopinath GR, Wu GR, Matthews L, Lewis

S, Birney E, Stein L (2005) Reactome: a

knowledgebase of biological pathways. Nucleic

Acids Res 33(Database issue):D428–D432.

doi:10.1093/nar/gki072

15. Hermjakob H, Montecchi-Palazzi L,

Lewington C, Mudali S, Kerrien S, Orchard S,

Vingron M, Roechert B, Roepstorff P, Valencia

A, Margalit H, Armstrong J, Bairoch A,

Cesareni G, Sherman D, Apweiler R (2004)

IntAct: an open source molecular interaction

database. Nucleic Acids Res 32(Database

issue):D452–D455. doi:10.1093/nar/gkh052

16. Kandasamy K, Mohan SS, Raju R,

Keerthikumar S, Kumar GS, Venugopal AK,

Telikicherla D, Navarro JD, Mathivanan S,

Pecquet C, Gollapudi SK, Tattikota SG,

Mohan S, Padhukasahasram H, Subbannayya

Y, Goel R, Jacob HK, Zhong J, Sekhar R,

Nanjappa V, Balakrishnan L, Subbaiah R,

Ramachandra YL, Rahiman BA, Prasad TS,

Lin JX, Houtman JC, Desiderio S, Renauld

JC, Constantinescu SN, Ohara O, Hirano T,

Kubo M, Singh S, Khatri P, Draghici S, Bader

GD, Sander C, Leonard WJ, Pandey A (2010)

NetPath: a public resource of curated signal

transduction pathways. Genome Biol

11(1):R3. doi:10.1186/gb-2010-11-1-r3

High Throughput Proteomic Analysis

160

17. Mi H, Poudel S, Muruganujan A, Casagrande

JT, Thomas PD (2016) PANTHER version 10:

expanded protein families and functions, and

analysis tools. Nucleic Acids Res 44(D1):D336–

D342. doi:10.1093/nar/gkv1194

18. von Mering C, Huynen M, Jaeggi D, Schmidt

S, Bork P, Snel B (2003) STRING: a database

of predicted functional associations between

proteins. Nucleic Acids Res 31(1):258–261

19. Zanzoni A, Montecchi-Palazzi L, Quondam

M, Ausiello G, Helmer-Citterich M, Cesareni

G (2002) MINT: a molecular INTeraction

database. FEBS Lett 513(1):135–140

20. Linding R, Jensen LJ, Pasculescu A, Olhovsky

M, Colwill K, Bork P, Yaffe MB, Pawson T

(2008) NetworKIN: a resource for exploring

cellular phosphorylation networks. Nucleic

Acids Res 36(Database issue):D695–D699.

doi:10.1093/nar/gkm902

21. Peri S, Navarro JD, Kristiansen TZ, Amanchy

R, Surendranath V, Muthusamy B, Gandhi

TK, Chandrika KN, Deshpande N, Suresh S,

Rashmi BP, Shanker K, Padma N, Niranjan V,

Harsha HC, Talreja N, Vrushabendra BM,

Ramya MA, Yatish AJ, Joy M, Shivashankar

HN, Kavitha MP, Menezes M, Choudhur y

DR, Ghosh N, Saravana R, Chandran S,

Mohan S, Jonnalagadda CK, Prasad CK,

Kumar-Sinha C, Deshpande KS, Pandey A

(2004) Human protein reference database as a

discovery resource for proteomics. Nucleic

Acids Res 32(Database issue):D497–D501.

doi:10.1093/nar/gkh070

22. Diella F, Cameron S, Gemund C, Linding R,

Via A, Kuster B, Sicheritz-Ponten T, Blom N,

Gibson TJ (2004) Phospho.ELM: a database of

experimentally veriﬁed phosphorylation sites in

eukaryotic proteins. BMC Bioinformatics 5:79.

doi:10.1186/1471-2105-5-79

23. Garavelli JS (2004) The RESID database of

protein modiﬁcations as a resource and anno-

tation tool. Proteomics 4(6):1527–1533.

doi:10.1002/pmic.200300777

24. Hornbeck PV, Zhang B, Murray B, Kornhauser

JM, Latham V, Skrzypek E (2015)

PhosphoSitePlus, 2014: mutations, PTMs and

recalibrations. Nucleic Acids Res 43(Database

issue):D512–D520. doi:10.1093/nar/gku1267

25. Gupta R, Birch H, Rapacki K, Brunak S,

Hansen JE (1999) O-GLYCBASE version 4.0:

a revised database of O-glycosylated proteins.

Nucleic Acids Res 27(1):370–372

26. Creasy DM, Cottrell JS (2004) Unimod: pro-

tein modiﬁcations for mass spectrometry.

Proteomics 4(6):1534–1536. doi:10.1002/

pmic.200300744

27. Gnad F, Ren S, Cox J, Olsen JV, Macek B,

Oroshi M, Mann M (2007) PHOSIDA (phos-

phorylation site database): management, struc-

tural and evolutionary investigation, and

prediction of phosphosites. Genome Biol

8(11):R250. doi:10.1186/gb-2007-8-11-r250

28. Huang HD, Lee TY, Tzeng SW, Horng JT

(2005) KinasePhos: a web tool for identifying

protein kinase-speciﬁc phosphorylation sites.

Nucleic Acids Res 33(Web Server Issue):226–

229. doi:10.1093/nar/gki471

29. Blom N, Gammeltoft S, Brunak S (1999)

Sequence and structure-based prediction of

eukaryotic protein phosphorylation sites.

J Mol Biol 294(5):1351–1362. doi:10.1006/

jmbi.1999.3310

30. Iakoucheva LM, Radivojac P, Brown CJ,

O'Connor TR, Sikes JG, Obradovic Z, Dunker

AK (2004) The importance of intrinsic disorder

for protein phosphorylation. Nucleic Acids Res

32(3):1037–1049. doi:10.1093/nar/gkh253

31. Kiemer L, Bendtsen JD, Blom N (2005)

NetAcet: prediction of N-terminal acetylation

sites. Bioinformatics 21(7):1269–1270.

doi:10.1093/bioinformatics/bti130

32. Lee TY, Hsu JB, Lin FM, Chang WC, Hsu PC,

Huang HD (2010) N-Ace: using solvent acces-

sibility and physicochemical properties to identify

protein N-acetylation sites. J Comput Chem

31(15):2759–2771. doi:10.1002/jcc.21569

33. Suo SB, Qiu JD, Shi SP, Sun XY, Huang SY,

Chen X, Liang RP (2012) Position-speciﬁc anal-

ysis and prediction for protein lysine acetylation

based on multiple features. PLoS One 7(11),

e49108. doi:10.1371/journal.pone.0049108

34. Li Y, Wang M, Wang H, Tan H, Zhang Z,

Webb GI, Song J (2014) Accurate in silico

identiﬁcation of species-speciﬁc acetylation

sites by integrating protein sequence-derived

and functional features. Sci Rep 4:5765.

doi:10.1038/srep05765

35. Radivojac P, Vacic V, Haynes C, Cocklin RR,

Mohan A, Heyen JW, Goebl MG, Iakoucheva

LM (2010) Identiﬁcation, analysis, and predic-

tion of protein ubiquitination sites. Proteins

78(2):365–380. doi:10.1002/prot.22555

36. Tung CW, Ho SY (2008) Computational

identiﬁcation of ubiquitylation sites from pro-

tein sequences. BMC Bioinformatics 9:310.

doi: 10.1186/1471-2105-9-310

37. Lee H, Yi GS, Park JC (2008) E3Miner: a text

mining tool for ubiquitin-protein ligases.

Nucleic Acids Res 36(Web Server Issue):416–

422. doi:10.1093/nar/gkn286

38. Chen Z, Zhou Y, Song J, Zhang Z (2013)

hCKSAAP_UbSite: improved prediction of

human ubiquitination sites by exploiting amino

acid pattern and properties. Biochim Biophys

Acta 1834(8):1461–1467. doi:10.1016/j.

bbapap.2013.04.006

Krishna Patel et al.

161

39. Qiu WR, Xiao X, Lin WZ, Chou KC (2015)

iUbiq-Lys: prediction of lysine ubiquitination

sites in proteins by extracting sequence evolu-

tion information via a gray system model.

J Biomol Struct Dyn 33(8):1731–1742. doi:1

0.1080/07391102.2014.968875

40. Du Y, Xu N, Lu M, Li T (2011) hUbiquitome:

a database of experimentally veriﬁed ubiquiti-

nation cascades in humans. Database (Oxford)

2011:bar055. doi:10.1093/database/bar055

41. Eiﬂer K, Vertegaal AC (2015) Mapping the

SUMOylated landscape. FEBS J 282(19):3669–

3680. doi:10.1111/febs.13378

42. Xue Y, Zhou F, Fu C, Xu Y, Yao X (2006)

SUMOsp: a web server for sumoylation site

prediction. Nucleic Acids Res 34(Web Server

Issue):254–257. doi:10.1093/nar/gkl207

43. Zhao Q, Xie Y, Zheng Y, Jiang S, Liu W, Mu

W, Liu Z, Zhao Y, Xue Y, Ren J (2014) GPS-

SUMO: a tool for the prediction of sumoylation

sites and SUMO-interaction motifs. Nucleic

Acids Res 42(Web Server Issue):325–330.

doi:10.1093/nar/gku383

44. Caragea C, Sinapov J, Silvescu A, Dobbs D,

Honavar V (2007) Glycosylation site predic-

tion using ensembles of Support Vector

Machine classiﬁers. BMC Bioinformatics

8:438. doi:10.1186/1471-2105-8-438

45. Julenius K (2007) NetCGlyc 1.0: prediction of

mammalian C-mannosylation sites. Glycobiology

17(8):868–876. doi:10.1093/glycob/cwm050

46. Hansen JE, Lund O, Tolstrup N, Gooley AA,

Williams KL, Brunak S (1998) NetOglyc: pre-

diction of mucin type O-glycosylation sites

based on sequence context and surface acces-

sibility. Glycoconj J 15(2):115–130

47. Gupta R, Jung E, Brunak S (2004)

NetNGlyc 1.0 Server. Center for biological

sequence analysis, technical university of

Denmark (http://wwwcbsdtudk/services/

NetNGlyc)

48. Pierleoni A, Martelli PL, Casadio R (2008)

PredGPI: a GPI-anchor predictor. BMC

Bioinformatics 9:392. doi:10.1186/

1471-2105-9-392

49. Fankhauser N, Maser P (2005) Identiﬁcation

of GPI anchor attachment signals by a

Kohonen self-organizing map. Bioinformatics

21(9):1846–1852. doi:10.1093/bioinformat-

ics/bti299

50. Obenauer JC, Cantley LC, Yaffe MB

(2003) Scansite 2.0: proteome-wide predic-

tion of cell signaling interactions using short

sequence motifs. Nucleic Acids Res 31(13):

3635–3641

51. Chou MF, Schwartz D (2011) Biological

sequence motif discovery using motif-x. Curr

Protoc Bioinformatics 13:15–24. doi:10.1002/

0471250953.bi1315s35

52. Hummel J, Niemann M, Wienkoop S, Schulze

W, Steinhauser D, Selbig J, Walther D,

Weckwerth W (2007) ProMEX: a mass spec-

tral reference database for proteins and protein

phosphorylation sites. BMC Bioinformatics

8:216. doi:10.1186/1471-2105-8-216

53. Dahlquist KD, Salomonis N, Vranizan K,

Lawlor SC, Conklin BR (2002) GenMAPP, a

new tool for viewing and analyzing microarray

data on biological pathways. Nat Genet

31(1):19–20. doi:10.1038/ng0502-19

54. Shannon P, Markiel A, Ozier O, Baliga

NS, Wang JT, Ramage D, Amin N, Schwikowski

B, Ideker T (2003) Cytoscape: a software envi-

ronment for integrated models of biomolecular

interaction networks. Genome Res

13(11):2498–2504. doi:10.1101/gr.1239303

55. Hooper SD, Bork P (2005) Medusa: a simple

tool for interaction graph analysis. Bioinformatics

21(24):4432–4433. doi:10.1093/bioinformat-

ics/bti696

56. Tyanova S, Temu T, Sinitcyn P, Carlson A,

Hein M, Geiger T, Mann M and Cox J

(2016) The Perseus computational platform

for comprehensive analysis of (prote)omics

data. Nature Methods 3(9):731–740. doi:

10.1038/nmeth.3901

High Throughput Proteomic Analysis

Proteomics approaches: A review regarding an importance of proteome analyses in understanding the pathogens and diseases

Article

Full-text available

Dec 2022

Proteomics is playing an increasingly important role in identifying pathogens, emerging and re-emerging infectious agents, understanding pathogenesis, and diagnosis of diseases. Recently, more advanced and sophisticated proteomics technologies have transformed disease diagnostics and vaccines development. The detection of pathogens is made possible by more accurate and time-constrained technologies, resulting in an early diagnosis. More detailed and comprehensive information regarding the proteome of any noxious agent is made possible by combining mass spectrometry with various gel-based or short-gun proteomics approaches recently. MALDI-ToF has been proved quite useful in identifying and distinguishing bacterial pathogens. Other quantitative approaches are doing their best to investigate bacterial virulent factors, diagnostic markers and vaccine candidates. Proteomics is also helping in the identification of secreted proteins and their virulence-related functions. This review aims to highlight the role of cutting-edge proteomics approaches in better understanding the functional genomics of pathogens. This also underlines the limitations of proteomics in bacterial secretome research.

Cholangiocyte derived carcinomas and local microbiota

Article

Full-text available

Sep 2020

Trillions of bacteria are present in the gastrointestinal tract as part of the local microbiota. Bacteria have been associated with a wide range of gastrointestinal diseases including malignant neoplasms. The association of bacteria in gastrointestinal and biliary tract carcinogenesis is supported in the paradigm of Helicobacter pylori and intestinal-type gastric cancer. However, the association of bacterial species to a specific carcinoma, different from intestinal-type gastric cancer is unresolved. The relationship of bacteria to a specific malignant neoplasm can drive clinical interventions. We review the classic bacteria risk factors identified using cultures and PCR (polymerase chain reaction) with new research regarding a microbiota approach through 16S rRNA (16S ribosomal ribonucleic acid gene) or metagenomic analysis for selected carcinomas in the biliary tract.

Biliary Tract Carcinogenesis Model Based on Bile Metaproteomics

Article

Full-text available

Jul 2020

Purpose: To analyze human and bacteria proteomic profiles in bile, exposed to a tumor vs. non-tumor microenvironment, in order to identify differences between these conditions, which may contribute to a better understanding of pancreatic carcinogenesis. Patients and Methods: Using liquid chromatography and mass spectrometry, human and bacterial proteomic profiles of a total of 20 bile samples (7 from gallstone (GS) patients, and 13 from pancreatic head ductal adenocarcinoma (PDAC) patients) that were collected during surgery and taken directly from the gallbladder, were compared. g:Profiler and KEGG (Kyoto Encyclopedia of Genes and Genomes) Mapper Reconstruct Pathway were used as the main comparative platform focusing on over-represented biological pathways among human proteins and interaction pathways among bacterial proteins. Results: Three bacterial infection pathways were over-represented in the human PDAC group of proteins. IL-8 is the only human protein that coincides in the three pathways and this protein is only present in the PDAC group. Quantitative and qualitative differences in bacterial proteins suggest a dysbiotic microenvironment in the PDAC group, supported by significant participation of antibiotic biosynthesis enzymes. Prokaryotes interaction signaling pathways highlight the presence of zeatin in the GS group and surfactin in the PDAC group, the former in the metabolism of terpenoids and polyketides, and the latter in both metabolisms of terpenoids, polyketides and quorum sensing. Based on our findings, we propose a bacterial-induced carcinogenesis model for the biliary tract. Conclusion: To the best of our knowledge this is the first study with the aim of comparing human and bacterial bile proteins in a tumor vs. non-tumor microenvironment. We proposed a new carcinogenesis model for the biliary tract based on bile metaproteomic findings. Our results suggest that bacteria may be key players in biliary tract carcinogenesis, in a long-lasting dysbiotic and epithelially harmful microenvironment, in which specific bacterial species' biofilm formation is of utmost importance. Our finding should be further explored in future using in vitro and in vivo investigations.

PANTHER version 10: Expanded protein families and functions, and analysis tools

Article

Full-text available

Nov 2015
NUCLEIC ACIDS RES

PANTHER (Protein Analysis THrough Evolutionary Relationships, http://pantherdb.org) is a widely used online resource for comprehensive protein evolutionary and functional classification, and includes tools for large-scale biological data analysis. Recent development has been focused in three main areas: genome coverage, functional information (‘annotation’) coverage and accuracy, and improved genomic data analysis tools. The latest version of PANTHER, 10.0, includes almost 5000 new protein families (for a total of over 12 000 families), each with a reference phylogenetic tree including protein-coding genes from 104 fully sequenced genomes spanning all kingdoms of life. Phylogenetic trees now include inference of horizontal transfer events in addition to speciation and gene duplication events. Functional annotations are regularly updated using the models generated by the Gene Ontology Phylogenetic Annotation Project. For the data analysis tools, PANTHER has expanded the number of different ‘functional annotation sets’ available for functional enrichment testing, allowing analyses to access all Gene Ontology annotations—updated monthly from the Gene Ontology database—in addition to the annotations that have been inferred through evolutionary relationships. The Prowler (data browser) has been updated to enable users to more efficiently browse the entire database, and to create custom gene lists using the multiple axes of classification in PANTHER.

Mapping the SUMOylated landscape

Article

Full-text available

Jul 2015
FEBS J

SUMOylation is a posttranslational modification regulating a multitude of cellular processes, including replication, cell cycle progression, protein transport and the DNA damage response. Similar to ubiquitin, the Small Ubiquitin-like Modifier (SUMO) is covalently attached to target proteins in a reversible process via an enzymatic cascade. SUMOylation is essential for nearly all eukaryotic organisms and deregulation of the SUMO system is associated with human diseases such as cancer and neurodegenerative diseases. Therefore it is of great interest to understand the regulation and dynamics of this posttranslational modification. Within the last decade, mass spectrometry analyses of SUMO proteomes has overcome several obstacles, greatly expanding the number of known SUMO target proteins. In this review we will briefly outline the basic concepts of the SUMO system and critically discuss the potential of proteomic approaches to decipher SUMOylation patterns in order to understand the role of SUMO in health and disease. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

PhosphoSitePlus, 2014: Mutations, PTMs and recalibrations

Article

Full-text available

Dec 2014

PhosphoSitePlus(®) (PSP, http://www.phosphosite.org/), a knowledgebase dedicated to mammalian post-translational modifications (PTMs), contains over 330 000 non-redundant PTMs, including phospho, acetyl, ubiquityl and methyl groups. Over 95% of the sites are from mass spectrometry (MS) experiments. In order to improve data reliability, early MS data have been reanalyzed, applying a common standard of analysis across over 1 000 000 spectra. Site assignments with P > 0.05 were filtered out. Two new downloads are available from PSP. The 'Regulatory sites' dataset includes curated information about modification sites that regulate downstream cellular processes, molecular functions and protein-protein interactions. The 'PTMVar' dataset, an intersect of missense mutations and PTMs from PSP, identifies over 25 000 PTMVars (PTMs Impacted by Variants) that can rewire signaling pathways. The PTMVar data include missense mutations from UniPROTKB, TCGA and other sources that cause over 2000 diseases or syndromes (MIM) and polymorphisms, or are associated with hundreds of cancers. PTMVars include 18 548 phosphorlyation sites, 3412 ubiquitylation sites, 2316 acetylation sites, 685 methylation sites and 245 succinylation sites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

IUbiq-Lys: Prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model

Article

Full-text available

Sep 2014

Abstract As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, grey system model, as well as the general form of pseudo amino acid composition. It was demonstrated via the rigorous cross validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.

Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

Article

Full-text available

Jul 2014

Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.

GPS-SUMO: A tool for the prediction of sumoylation sites and SUMO-interaction motifs

Article

Full-text available

May 2014

Small ubiquitin-like modifiers (SUMOs) regulate a variety of cellular processes through two distinct mechanisms, including covalent sumoylation and non-covalent SUMO interaction. The complexity of SUMO regulations has greatly hampered the large-scale identification of SUMO substrates or interaction partners on a proteome-wide level. In this work, we developed a new tool called GPS-SUMO for the prediction of both sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new generation group-based prediction system (GPS) algorithm integrated with Particle Swarm Optimization approach was applied. By critical evaluation and comparison, GPS-SUMO was demonstrated to be substantially superior against other existing tools and methods. With the help of GPS-SUMO, it is now possible to further investigate the relationship between sumoylation and SUMO interaction processes. A web service of GPS-SUMO was implemented in PHP + JavaScript and freely available at http://sumosp.biocuckoo.org.

KEGG: Kyoto encyclopedia of genes and genomes

Article

Jan 2006
ARTIF INTELL

DAVID: Database for annotation, visualization, and integrated discovery

Article

Jan 2003

The Perseus computational platform for comprehensive analysis of (prote)omics data

Article

Jun 2016

A main bottleneck in proteomics is the downstream biological analysis of highly multivariate quantitative protein abundance data generated using mass-spectrometry-based analysis. We developed the Perseus software platform (http://www.perseus-framework.org) to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data. Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing. A machine learning module supports the classification and validation of patient groups for diagnosis and prognosis, and it also detects predictive protein signatures. Central to Perseus is a user-friendly, interactive workflow environment that provides complete documentation of computational methods used in a publication. All activities in Perseus are realized as plugins, and users can extend the software by programming their own, which can be shared through a plugin store. We anticipate that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.

FunRich: An open access standalone functional enrichment and interaction network analysis tool

Article

Apr 2015
PROTEOMICS

As high-throughput techniques including proteomics become more accessible to individual laboratories, there is an urgent need for a user-friendly bioinformatics analysis system. Here, we describe FunRich, an open access, standalone functional enrichment and network analysis tool. FunRich is designed to be used by biologists with minimal or no support from computational and database experts. Using FunRich, users can perform functional enrichment analysis on background databases that are integrated from heterogeneous genomic and proteomic resources (>1.5 million annotations). Besides default human specific FunRich database, users can download data from the UniProt database which currently supports 20 different taxonomies against which enrichment analysis can be performed. Moreover, the users can build their own custom databases and perform the enrichment analysis irrespective of organism. In addition to proteomics datasets, the custom database allows for the tool to be used for genomics, lipidomics and metabolomics datasets. Thus FunRich allows for complete database customization and thereby permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used. FunRich is user-friendly and provides graphical representation (Venn, pie charts, bar graphs, column, heatmap and doughnuts) of the data with customizable font, scale and color (publication quality) This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.