ArticlePDF Available

Brief Overview of Bioinformatics Activities in Singapore

PLOS
PLOS Computational Biology
Authors:

Abstract

The frontier of biological and medical sciences is full of opportunity today. It is widely appreciated that present-day biomedical researchers are confronted by vast amounts of data from genome sequencing; microscopy; high-throughput analytical techniques for DNA, RNA, and proteins; and a host of other new experimental technologies. Coupled with advances in computing power, this flow of information enables scientists to computationally model and analyze biological systems in novel ways. Therefore, bioinformatics is seen as an important ingredient in Singapore's ambition to be an international center for the biomedical sciences and their related industries. Five organizations are involved in bioinformatics in Singapore in a major way. Two of these are universities in Singapore, namely the National University of Singapore (NUS) and the Nanyang Technological University (NTU). NUS has a longer history in bioinformatics and life science training and research, while NTU did not have a life science school until the early 2000s. The other three are institutes under the Agency for Science Technology & Research (A*STAR), namely the BioInformatics Institute (BII), the Genome Institute of Singapore (GIS), and the Institute for Infocomm Research (I2R). I2R has the longest history in this field in Singapore, and it accounted for a lion's share of Singapore's output in bioinformatics research from 1994 to 2005. BII and GIS are entities set up in the early 2000s; they have now matured into major forces in bioinformatics research in Singapore. An earlier report describes the development and personalities of Singapore bioinformatics from 1992 to 2002 [1]. The bioinformatics scene in Singapore has undergone some important changes since 2005, with new leadership in three of the five major centers of activities in Singapore—BII, I2R, and NUS. Here, we provide an updated overview of bioinformatics research and training activities at these organizations, as well as at GIS and NTU.
Perspective
Brief Overview of Bioinformatics Activities in Singapore
Frank Eisenhaber
1
, Chee-Keong Kwoh
2
, See-Kiong Ng
3
, Wing-King Sung
4,5
, Limsoon Wong
5
*
1BioInformatics Institute, Singapore, 2Nanyang Technological University, Singapore, 3Institute for Infocomm Research, Singapore, 4Genome Institute of Singapore,
Singapore, 5National University of Singapore, Singapore
Introduction
The frontier of biological and medical
sciences is full of opportunity today. It is
widely appreciated that present-day biomed-
ical researchers are confronted by vast
amounts of data from genome sequencing;
microscopy; high-throughput analytical tech-
niques for DNA, RNA, and proteins; and a
host of other new experimental technologies.
Coupled with advances in computing power,
this flow of information enables scientists to
computationally model and analyze biolog-
ical systems in novel ways. Therefore,
bioinformatics is seen as an important
ingredient in Singapore’s ambition to be an
international center for the biomedical
sciences and their related industries.
Five organizations are involved in bioinfor-
matics in Singapore in a major way. Two of
these are universities in Singapore, namely
the National University of Singapore (NUS)
and the Nanyang Technological University
(NTU). NUS has a longer history in
bioinformatics and life science training and
research, while NTU did not have a life
science school until the early 2000s. The
other three are institutes under the Agency for
Science Technology & Research (A*STAR),
namely the BioInformatics Institute (BII), the
Genome Institute of Singapore (GIS), and the
Institute for Infocomm Research (I
2
R). I
2
R
has the longest history in this field in
Singapore, and it accounted for a lion’s share
of Singapore’s output in bioinformatics
research from 1994 to 2005. BII and GIS
are entities set up in the early 2000s; they
have now matured into major forces in
bioinformatics research in Singapore.
An earlier report describes the develop-
ment and personalities of Singapore bioin-
formatics from 1992 to 2002 [1]. The
bioinformatics scene in Singapore has
undergone some important changes since
2005, with new leadership in three of the five
major centers of activities in Singapore—
BII, I
2
R, and NUS. Here, we provide an
updated overview of bioinformatics research
and training activities at these organizations,
as well as at GIS and NTU.
Research at BII
BII (http://www.bii.a-star.edu.sg) of
A*STAR was originally founded in 2001.
After a tumultuous history with changing
missions and directors, BII was essentially
relaunched in the autumn of 2007. Now its
mission is defined primarily as a compu-
tational biology research institute. Its new
director, Frank Eisenhaber (previously at
the Research Institute of Molecular Pa-
thology in Vienna, Austria), guides the
transition.
BII sees its future as a center for
research in the field of biomolecular
mechanism exploration driven by compu-
tational biology. Thus, BII is meant to
remain primarily a theoretical institute.
But in contrast to the previous concept,
experimental work has a place at the
Institute both for the follow-up of theoret-
ically derived hypotheses and for the
generation of datasets that are important
for the development of theoretical ap-
proaches to biological problems.
The emphasis on biomolecular mecha-
nisms is guided both by fundamental and
by pragmatic considerations. Computa-
tional biology will have a great impact in
this area since the ever-increasing body of
sequence data, together with other large-
scale datasets on expression, structure,
interaction, and subcellular localization
of biomolecules, provide great opportuni-
ties for achieving new biological insight
using theoretical arguments. BII is located
in the Biopolis in the Buona Vista area of
Singapore and wishes to find synergies by
interacting with the community, especially
with other A*STAR biomedical research
institutes that concentrate on genomics
(GIS), molecular and cellular biology
(IMCB), as well as their context with
human disease (IMB, SiCS, SiGN), and
with biotechnology applications (BTI,
ETC, IBN).
At present, BII hosts 11 independent
research teams organized into four research
divisions. The ‘‘Imaging Informatics’’ sec-
tion develops automated tools for the
quantification of the distribution of labeled
molecules with regard to subcellular struc-
tures in images of cells. BII’s own micros-
copy lab is coming into operation in
summer 2009. In the ‘‘Genome Sequence
and Gene Expression Data Analysis’’
division organized by Vladimir Kuznetsov,
the research focus is on understanding
transcriptional regulation and the biologi-
cal role of non-coding RNA. Chandra
Verma guides the ‘‘Biomolecular Structure
and Design’’ division, the teams of which
analyze and simulate 3D structural assem-
blies of biomolecules and try to connect
structural features with biological function.
Finally, the ‘‘Biomolecular Function
Discovery’’ unit is quite a unique setup
since it combines a protein sequence
analysis group with a biochemical labora-
tory for the verification of predicted gene
functions and a software team working on
the ANNOTATOR environment, a sys-
tem of workflows for annotating unchar-
acterized protein sequences.
Given that any really serious scientific
project takes a few years, time will tell
whether the promise of BII will be
realized. Nevertheless, several recent pub-
lications show a glimpse of BII’s opportu-
nities. For example, the mutations of the
neuraminidase from the 2009 H1N1
(swine flu) virus strain have been shown
not to affect the binding pocket of the
antiviral drugs oseltamivir (Tamiflu), za-
namivir (Relenza), and peramivir [2]. As
another example, the ANNIE software sets
a new standard in protein sequence
annotation and function prediction [3].
BII offers the opportunity for Ph.D.
students who are affiliated with any
university in the world (for their examina-
tions and their degree) to carry out
Citation: Eisenhaber F, Kwoh C-K, Ng S-K, Sung W-K, Wong L (2009) Brief Overview of Bioinformatics Activities
in Singapore. PLoS Comput Biol 5(9): e1000508. doi:10.1371/journal.pcbi.1000508
Editor: Philip E. Bourne, University of California San Diego, United States of America
Published September 25, 2009
Copyright: ß2009 Eisenhaber et al. This is an open-access article distributed under the terms of the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: wongls@comp.nus.edu.sg
PLoS Computational Biology | www.ploscompbiol.org 1 September 2009 | Volume 5 | Issue 9 | e1000508
research on one of the teams and to
receive local monetary support over a
period of three years.
Research at I
2
R
I
2
R (http://www.i2r.a-star.edu.sg) of
A*STAR is a research institute for infor-
mation technologies. As such, it devotes a
small amount of its resources to bioinfor-
matics, namely part of its data mining
department. The primary objective of the
bioinformatics research program at I
2
Ris
to inspire new research in data mining
through computational analysis of bio-
medical data. Since See-Kiong Ng took
over as manager of the data mining
department in 2006, the group has focused
on two areas, namely text mining and
graph mining, to address the computa-
tional challenges brought about by the
abundance of unstructured text and inter-
action networks in biology.
As one of the early pioneers in biomed-
ical text mining [4], I
2
R has been
developing effective text-mining approach-
es for extracting useful information from
the vast biomedical literature. The group
actively participates in international efforts
in this domain. For example, they are part
of the EU’s BOOTStrep (Bootstrapping
Of Ontologies and Terminologies STrate-
gic REsearch Project) program to develop
an integrated text analysis system for
biological documents, and they also col-
laborate with Tokyo University in devel-
oping a large-scale co-reference corpus on
Medline abstracts. The group’s text-min-
ing methods have been shown to be
among the best in international bench-
mark competitions such as BioCreAtIvE
[5].
For graph mining, the group has been
focusing on the analysis of whole-genome
protein–protein interaction networks, ad-
dressing such practical issues as handling
the high abundance of experimental errors
in the data and the effective integration of
domain knowledge in the analysis. Given
that biological systems are largely made up
of networks of molecular interactions,
developing data-mining methods to dis-
cover useful patterns from large networks
is essential for understanding how cellular
biology works, even though graph mining
is intrinsically challenging computational-
ly, with many problems proven to be NP-
hard. The group collaborates extensively
with local universities to develop algo-
rithms that can be applied to experimen-
tally determined protein–protein interac-
tion networks to discover new biological
knowledge such as domain interactions [6]
and protein complexes [7].
I
2
R also has an emphasis on applied
research. As such, the group is driven by
the need to apply the computational
methods developed to help biologists
deepen their understanding of molecular
biology, and to harvest the knowledge
gained to combat the many health threats
that Singapore faces today. One unique
biological application domain that the
bioinformatics group at I
2
R has been
focusing on is computational immunology.
This is particularly relevant to Singapore
given its recent close shaves with the
SARS and avian flu viruses, as well as
the emergence of tropical infectious dis-
eases such as Dengue and Chikungunya
fevers. With the alarming increase in
worldwide outbreaks in the last few years,
it is clearly also of great global concern. As
vaccination has been one of the most
successful public health intervention mea-
sures against infectious diseases, to gain a
fighting chance against these new health
threats it is crucial to significantly acceler-
ate the development of vaccines. The
group at I
2
R was one of the first to realize
that the recent advances in genomic,
proteomic, and bioinformatics technolo-
gies have offered new opportunities to do
so. They have been developing and
applying computational methods to screen
large sets of protein antigens, such as those
encoded by complete viral genomes [8],
and validating their computational results
by working closely with bench biologists
both locally and internationally. Thus far,
the group has worked on various viruses
such as Dengue, West Nile virus, Yellow
Fever virus, Human Influenza A, and
Chikungunya. In 2008, the current prin-
cipal investigator of the project, Joo
Chuan Tong, was selected as one of the
35 top innovators in science and technol-
ogy under the age of 35 by MIT’s
Technology Review magazine for his research
in ‘‘personalized vaccine design’’.
Research at GIS
GIS (http://www.gis.a-star.edu.sg) is an
A*STAR institute focused on genomic
research. GIS aims to have a deeper
understanding of cancer biology, stem cell
biology, molecular pharmacology, and
infectious disease through genomic study.
Bioinformatics is used as a tool to support
the associated high-throughput genomic
analyses. Roughly speaking, the bioinfor-
matics work at GIS can be divided into
three domains: sequence analysis, com-
parative genomics, and microarray study.
GIS has developed a series of high-
throughput DNA sequencing technologies
based on paired-end ditags (PET). These
technologies accelerate the understanding of
the dynamics and the structure of DNA
elements in our complex genome. A com-
putational sequence analysis pipeline is a
main vehicle for transforming raw sequenc-
ing data into meaningful information. Com-
bined with upstream bioinformatics analysis,
it leads to biological discovery. One example
is genome-wide fusion gene identification
using GIS-PET [9]. In GIS-PET analysis,
PETs from the two ends of each expressed
transcript (18 bp from the 59end and 18 bp
from the 39end) are extracted. Mapping the
PETs onto the reference genome gives the
precise transcript boundaries. However,
4%–5% of PETs still cannot be mapped.
These PETs may represent unconventional
transcripts such as fusion genes whose 59and
39ends may map on different chromosomes.
Through a novel clustering algorithm, 170
fusion gene candidates are identified.
Comparative genomic analysis is applied
at GIS to understand the genome rear-
rangement in cancer and the evolution of
regulatory sequences in our genome. For
example, analyzing several transcription
factors using ChIP-Seq technology [10]
showed that a large portion of binding sites
are embedded in repeats. More precisely,
those binding sites are located in distinctive
families of transposable elements. This
study indicates that transposable elements
play an important role in expanding the
repertoire of binding sites.
Microarray analysis is performed daily
at GIS for studying gene expression and
for diagnosis. In addition to routine
bioinformatics analysis, the groups at
GIS also develop new technology using
the microarray platform. For instance,
they developed the pathogen chip [11],
which detects the presence of viruses from
patient samples in an unbiased manner.
The major difficulty for virus detection is
how to amplify the complete genomes of
the viruses. Researchers at GIS proposed a
computational method that designs a
random primer that can amplify a selected
set of viruses efficiently.
In the future, bioinformatics will remain
a main weapon at GIS to understand the
mechanisms in our genome. Current work
includes understanding the chromatin
structure, deciphering the histone code
and the transcriptome map, and studying
genome rearrangement in cancer ge-
nomes. All these works rely heavily on
bioinformatics.
Training Program at NTU
There are two main formal bioinfor-
matics training programs in Singapore.
The first is a master’s program at NTU.
PLoS Computational Biology | www.ploscompbiol.org 2 September 2009 | Volume 5 | Issue 9 | e1000508
The second is a bachelor’s program at
NUS (see next section). In this section we
describe the former, which is modeled
after the approach proposed in [12]. The
curriculum comprises a set of core bioin-
formatics courses that build upon the
contributing disciplines to present the
basic intellectual structure of the field.
The NTU bioinformatics program offers
a two-year part-time or one-year full-time
training leading to an M.Sc. degree. It is
designed for students who have relevant
scientific and technical backgrounds (engi-
neering or science degrees). The curricu-
lum provides them with skills for the
creation of excellent well-validated meth-
ods for solving problems in the domain of
bioinformatics and related fields.
The program gives students enough
time to learn about tool use and later on
tool development. Full-time students must
complete six core modules, two elective
modules, and a project to graduate, while
part-time students may complete addition-
al elective modules instead of the project.
The six core modules are: two biology
modules; an introductory bioinformatics
module, which trains students to be
proficient tool users; a statistics module;
and two modules on algorithms for
bioinformatics, which train students to
put together new efficient tools in addition
to being able to apply existing tools. After
taking all six core modules, the students
are expected to be proficient in imple-
menting, improving, and creating new
software tools and methods for analyzing
and organizing data. Once this core
foundation is laid, the students can move
on to select more current and diverse
topics in bioinformatics such as high-
performance computing for bioinformatics
and methods and tools for proteomics.
Due to the multidisciplinary nature of
the program, the teaching faculty is drawn
from the whole range of engineering and
science schools at NTU, such as the
School of Computer Engineering, the
School of Mechanical and Aerospace
Engineering, the School of Electrical and
Electronic Engineering, the School of
Chemical and Biomedical Engineering,
the National Institute of Education, and
the School of Biological Sciences. Further-
more, there are several adjunct faculty
members from GIS, I
2
R, BII, and the
National Cancer Centre who contribute
significantly in teaching and supervision.
Research and Training Program
at NUS
There are about twenty faculty mem-
bers at NUS who are involved in research
relating to bioinformatics to some extent.
Half of them are in the Computational
Biology Lab in the Department of Com-
puter Science (CBL, http://www.comp.
nus.edu.sg/,cbl), which has been coordi-
nated by Limsoon Wong since 2005. The
BioInformatics and Drug Design Group in
the Department of Pharmacy (BIDD,
http://bidd.nus.edu.sg/group/research.
htm), which has been led by Yuzong Chen
since 1997, is the second major center of
bioinformatics activities at NUS.
Research at CBL leads to fundamental
advances in knowledge discovery technol-
ogies, database technologies, combinatori-
al algorithms, and modeling and simula-
tion technologies, as well as in the
applications of these technologies to prob-
lems in biology and medicine. Research at
BIDD has as its main goals development
of computer-aided drug design methods
and software, development of bioinfor-
matics databases and software, and tool
development for and mechanistic study of
traditional Chinese medicine.
Some ongoing projects at NUS include
the following.
Gene Expression Analysis
Existing works on gene expression
analysis provide insufficient information
on the interplay between selected genes.
Also, the collection of pathways that can
be used, evaluated, and ranked against the
observed expression data is limited. Fur-
thermore, a comprehensive set of rules for
reasoning about relevant molecular events
has not been compiled and formalized. A
more advanced integrated framework to
provide biologically inspired solutions for
these challenges is envisioned in this
project [13].
Protein Complex Prediction
Protein–protein interaction (PPI) data
obtained by high-throughput assays con-
tain a high rate of errors. Thus it is
desirable to prioritize PPIs detected by
such high-throughput assays. Further-
more, PPI networks resulting from these
assays are essentially an in vitro scaffold.
Further progress in computational analysis
techniques and experimental methods is
needed to reliably deduce in vivo protein
interactions [14], to distinguish between
permanent and transient interactions, to
distinguish between direct protein binding
from membership in the same protein
complex [15], and to distinguish protein
complexes from functional modules. This
project aims to develop a system to process
results of high-throughput PPI assays, as
well as integrating extensive annotation
information, to yield a more informative
protein interactome.
Protein 3D Structure Analysis
The study of proteins from a structural
perspective gives more valuable information
about their functions. The two main objec-
tives in this project are to develop efficient
and effective methods to compare a pair of
3D protein structures [16] and to develop
efficient and effective methods to search a
database of 3D protein structures [17].
Functional Element Identification
Protein interactions with DNA and
RNA are the primary mechanisms for
controlling gene expression. What is
needed is a recognition code that maps
from the protein sequence to a pattern
that describes the family of DNA binding
sites—the functional elements. This pro-
ject develops methods for accurate identi-
fication of transcription factor binding sites
and also methods for inferring the inter-
actions of transcription factors and other
functional elements [18].
Protein Motion Simulation and
Analysis
Many interesting properties of molecu-
lar motion are best-characterized statisti-
cally by considering an ensemble of
motion pathways rather than an individual
one. Classic simulation techniques, such as
the Monte Carlo method and molecular
dynamics, generate individual pathways
one at a time and are easily trapped in the
local minima of the energy landscape.
They are computationally inefficient if
applied in a brute-force fashion to deal
with many pathways. The project intro-
duces Stochastic Roadmap Simulation, a
randomized technique for sampling mo-
lecular motion and exploring the kinetics
of such motion by examining multiple
pathways simultaneously [19].
Computational Systems Biology
Computational systems biology involves
studying cellular functions and its compo-
nents at varying degrees of granularity.
These levels range from the nano-scale
molecular structures (atomic level) to entire
organs such as heart and lungs (phenotype
level). The project focus is mainly on the
functional aspects of cellular components,
in the form of biopathways. The team
hopes to develop a set of tools and modeling
methodology to produce accurate models
that can be validated and that can be used
to predict new phenomena [20].
In terms of training activities, NUS has
a bachelor’s program in bioinformatics,
PLoS Computational Biology | www.ploscompbiol.org 3 September 2009 | Volume 5 | Issue 9 | e1000508
where science-based students receive a
B.Sc. (Bioinformatics) degree and comput-
ing-based students receive a B.Comp.
(Bioinformatics) degree. Both sets of stu-
dents share a core set of bioinformatics
courses and basic biology and computing
courses. The core bioinformatics courses
comprise the following chain of three
modules: 1) an introductory computational
biology module, which focuses on devel-
oping the understanding of bioinformatics
problems, the key principles for solving a
wide range of bioinformatics problems,
and the ability to interpret and analyze the
output of various tools and algorithms; 2) a
module on combinatorial methods in
bioinformatics, which introduces students
to combinatorial methods used frequently
in a range of bioinformatics problems such
as motif finding, population genetics,
genome annotation, and RNA structure;
and 3) a module on knowledge discovery
methods in bioinformatics, which intro-
duces students to data-mining algorithms
often used in a range of bioinformatics
problems such as gene expression profile
analysis and gene feature recognition.
After completing the basic courses and
the three core modules described above,
the program has a number of advanced
computational biology courses that can be
chosen as electives.
Concluding Remarks
As early as 1992, there were already
bioinformatics activities in Singapore cham-
pioned by Tin-Wee Tan at NUS. These
activities included mirroring of data collec-
tions and development of sequence analysis
applications. Bioinformatics activities in Sin-
gapore took on a deeper research character
when Limsoon Wong started work on the
Kleisli query system in 1994 [21]. This work
generated significant interest from several
large international pharmaceutical compa-
nies. This helped the Singapore Economic
Development Board become convinced to
fund, in 1996, a Bioinformatics Center at
NUS as a joint collaboration between the
activities of Tan and Wong. By 2000, the
potential of bioinformatics in modern bio-
medical research was fully recognized.
Therefore, A*STAR initiated significant
new funding to encourage and to support
research and development in this area. GIS
was established as the flagship organization
for high-throughput biological research in
Singapore. A year later, BII was established
to drive both bioinformatics training and
research. However, BII drifted in its twin
missions. NUS and NTU responded by
establishing proper degree programs in
bioinformatics in 2003 and 2002, respective-
ly, as well as by establishing more coordinat-
ed bioinformatics research programs in the
mid-2000s. In 2007, BII was relaunched
with research as its primary mission.
Today, the work of bioinformaticists
from Singapore are found in journals and
at conferences that are purely computer
science, purely biology, purely medicine,
as well as in the mainstream bioinfor-
matics journals. In fact, despite the small
size of her bioinformatics community
(,100), Singapore contributed 1.73% of
papers published in Bioinformatics since
2000. Furthermore, according to SCO-
PUS, these papers also account for 1.05%
of citations to Bioinformatics since 2000.
These data and the descriptions in the
preceding sections show that bioinfor-
matics activities in Singapore have grown
in diversity, intensity, and quality.
This healthy growth in research capa-
bility and government funding has helped
to attract international drug and life
sciences companies to Singapore. For
example, a significant portion of Eli Lilly’s
bioinformatics activities is now based at
the Lilly Singapore Centre for Drug
Discovery. The ease of recruiting well-
trained manpower is crucial to attracting
and maintaining such industry R&D
centers in Singapore. To groom truly
world-class Singaporean researchers, it is
important that they gain adequate over-
seas exposure as part of their training.
Because of the focus in research and
education in Singapore, many of our local
graduates are able to find offers for
doctorate and post-doctorate positions in
top universities and research centers
overseas. There are also ample govern-
ment sponsorships (e.g., A*STAR scholar-
ships) that provide financial support for
the local trainees to go overseas for their
doctoral and post-doctoral training. Those
who take up such sponsorships are re-
quired to return to Singapore after their
overseas stints, thereby providing a guar-
anteed pool of research talent in Singapore
to bolster local bioinformatics R&D. In
addition, we warmly welcome bioinforma-
ticists and computational biologists to
Singapore—http://www.comp.nus.edu.
sg/,wongls/openings.html lists some of
the opportunities in Singapore.
References
1. Wong L (2003) Bioinformatics in Singapore. Asia
Pacific Biotech News 7: 88–92.
2. Maurer-Stroh S, Ma J, Lee RTC, Sirota FL,
Eisenhaber F (2009) Mapping the sequence
mutations of the 2009 H1N1 influenza A virus
neuraminidase relative to drug and antibody
binding sites. Biol Direct 4: 18.
3. Ooi HS, Kwo CY, Wildpaner M, Sirota FL,
Eisenhaber B, et al. (2009) ANNIE: Integrated de
novo protein sequence annotation. Nucleic Acids
Res 37: W435–W440.
4. Ng SK, Wong M (1999) Toward routine
automatic pathway discovery from on-line scien-
tific text abstracts. Genome Inform 10: 104–112.
5. Zhou GD, Shen D, Zhang J, Su J, Tan SH (2005)
Recognition of protein and gene names from text
using an ensemble of classifiers and effective
abbreviationresolution. BMC Bioinformatics 6: S7.
6. Ng SK, Zhang Z, Tan SH (2003) Integrative
approach for computationally inferring protein
domain interactions. Bioinformatics 19: 923–929.
7. Li XL, Tan SH, Foo CS, Ng SK (2005) Interaction
graph mining for protein complexes using local
clique merging. Genome Inform 16: 260–269.
8. Tong JC, Zhang GL, Tan TW, August JT,
Brusic V, et al. (2006) Prediction of HLA-DQ3.2
ligands: Evidence of multiple registers in class II
binding peptides. Bioinformatics 22: 1232–1238.
9. Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD,
et al. (2007) Fusion transcripts and transcribed
retrotransposed loci discovered through compre-
hensive transcriptome analysis using Paired-End
diTags (PETs). Genome Res 17: 828–838.
10. Bourque G, Leong B, Vega VB, Chen X, Lee YL,
et al. (2008) Evolution of the mammalian
transcription factor binding repertoire via trans-
posable elements. Genome Res 18: 1752–1762.
11. Wong CW, Lee CWH, Leong WY, Soh SWL,
Kartasasmita CB, et al. (2007) Optimization and
clinical validation of a pathogen detection micro-
array. Genome Biol 8: R93.
12. Altman RB (1998) A curriculum for bioinfor-
matics: The time is ripe. Bioinformatics 14:
549–550.
13. Soh D, Dong D, Guo Y, Wong L (2007) Enabling
more sophisticated gene expression analysis for
understanding diseases and optimizing treat-
ments. ACM SIGKDD Explorations 9: 3–14.
14. Chua HN, Hugo W, Liu G, Li XL, Wong L, et al.
(2009) A probabilistic graph-theoretic approach
to integrate multiple predictions for the protein-
protein subnetwork prediction challenge.
Ann N Y Acad Sci 1158: 224–233.
15. Chua HN, Ning K, Sung WK, Leong HW,
Wong L (2008) Using indirect protein-protein
interactions for protein complex prediction.
J Bioinform Comput Biol 6: 435–466.
16. Aung Z, Tan KL (2006) MatAlign: Precise
protein structure comparison by matrix align-
ment. J Bioinform Comput Biol 4: 1197–1216.
17. Aung Z, Tan SH, Ng SK, Tan KL (2008)
PPiClust: Efficient clustering of 3D protein-
protein interaction interfaces. J Bioinform Com-
put Biol 6: 415–433.
18. Wijaya E, Yiu SM, Son NT, Kanagasabai R,
Sung WK (2008) MotifVoter: A novel ensemble
method for fine-grained integration of generic
motif finders. Bioinformatics 24: 2288–2295.
19. Chiang TH, Apaydin MS, Brutlag DL, Hsu D,
Latombe JC (2007) Using stochastic roadmap
simulation to predict experimental quantities in
protein folding kinetics: Folding rates and phiva-
lue. J Comput Biol 14: 578–593.
20. Koh G, Teong HFC, Clement MV, Hsu D,
Thiagarajan PS (2006) A decompositional ap-
proach to parameter estimation in pathway
modeling: A case study of the Akt and MAPK
pathways and their crosstalk. Bioinformatics 22:
e271–e280.
21. Chung SY, Wong L (1999) Kleisli, a new tool for
data integration in biology. Trends Biotechnol 17:
351–355.
PLoS Computational Biology | www.ploscompbiol.org 4 September 2009 | Volume 5 | Issue 9 | e1000508
... Whereas in developed countries education and training in Bioinformatics and Genome Analyses are now fully integrated in academic curricula, the situation is different in developing countries. Some countries embarked on these domains with the first sequencing projects [1][2][3][4][5], whereas in African countries, apart from some leading Institutions [6][7][8][9], the importance of these domains was only later realized [10][11]. A recent program [12] allowed developing many educational opportunities to learn Bioinformatics and Genome Analyses but mainly at the introductory levels [10,13]. ...
... It is their responsibility to make computational biology and genomics known as fields of research on their own. They are warmly invited to consider with attention the Singapore activities in these domains [4] as an example of what a small country can achieve. ...
Article
Full-text available
Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants’ skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.
... As reported in other cases-and in the context of this collection in PLOS Computational Biology [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19], Brazil [20], India [21], and elsewhere [22][23][24], the role of smaller countries in the global scene takes a special dimension, creating both specific responsibilities and significant opportunities. A general conclusion from those reports is that small(er) countries need to support global initiatives, by contributing to materials and methods within international projects and, in return, obtain access to data and training resources as well as build their own national agendas for the use of new technologies in certain fields of regional priority. ...
Article
Full-text available
We review the establishment of computational biology in Greece and Cyprus from its inception to date and issue recommendations for future development. We compare output to other countries of similar geography, economy, and size—based on publication counts recorded in the literature—and predict future growth based on those counts as well as national priority areas. Our analysis may be pertinent to wider national or regional communities with challenges and opportunities emerging from the rapid expansion of the field and related industries. Our recommendations suggest a 2-fold growth margin for the 2 countries, as a realistic expectation for further expansion of the field and the development of a credible roadmap of national priorities, both in terms of research and infrastructure funding. This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
... The founding of JBCB coincided with the dramatic strengthening of serious life science research in Singapore and in other countries in the Asia-Paci¯c region as well as with e®orts of enhanced international research integration of the Japanese community. [14][15][16] One might believe that JBCB has served mainly the local, Asia-Paci¯c bioinformatics research community since it might have needed some protected environment at the beginning. The data about the country of origin of the corresponding author (Table 3) shows that, on the one hand, JBCB has indeed helped the local research community to mature. ...
Article
Full-text available
The Journal of Bioinformatics and Computational Biology (JBCB) started publishing scientific articles in 2003. It has established itself as home for solid research articles in the field (~ 60 per year) that are surprisingly well cited. JBCB has an important function as alternative publishing channel in addition to other, bigger journals.
... Beyond these, APBioNet has also provided assistance to institutions in Pakistan and Saudi Arabia. As a result of these extensive, collective, and cooperative efforts to influence policy and raise awareness among policy makers and scientific leaders, Asian countries such as Singapore [12], Malaysia [13], Thailand [14], India (http://dbtindia.nic.in/ annual05-06/Ch-8-eng.pdf), the Philippines, Korea, Pakistan, Indonesia, Brunei, and many others [15] have had strong growth of bioinformatics and its allied disciplines over the last decade [16,17]. ...
Article
Full-text available
The Asia-Pacific Bioinformatics Network (APBioNet; www.apbionet.org) is a nonprofit, nongovernmental, international organization founded in 1998 that focuses on the promotion of bioinformatics in the Asia-Pacific region. APBioNet's mission, since its inception, has been to pioneer the growth and development of bioinformatics awareness, training, education, infrastructure, resources, and research among member countries and economies. Its work includes technical coordination, liaison, and/or affiliation with other international scientific bodies, such as the European Molecular Biology network (EMBnet) and the International Society for Computational Biology (ISCB). APBioNet has more than 20 organizational and 2,000 individual members from over 12 countries in the region, from industry, academia, research, government, investors, and international organizations. APBioNet is spearheading a number of key bioinformatics initiatives in collaboration with international organizations, such as the Asia-Pacific Advanced Network (APAN), the Association of South-East Asian Nations (ASEAN), the Asia-Pacific Economic Cooperation (APEC), and the Asia-Pacific International Molecular Biology Network (A-IMBN), and industry partners. Many of the initiatives and activities have been initiated through its flagship conference, the International Conference on Bioinformatics (InCoB). In 2012, APBioNet was incorporated in Singapore as a public limited liability company to ensure quality, sustainability, and continuity of its mission to advance bioinformatics across the region and beyond. We describe below the key thrust areas of APBioNet.
... The series of articles in PLOS Computational Biology on the development of bioinformatics activities in various countries, e.g., China [1], Australia [2], and Singapore [3], and the formation and successful development of the Polish Bioinformatics Society over the last five years, have inspired us to present a personal perspective on the advances of bioinformatics in Poland. ...
Article
Full-text available
The series of articles in PLOS Computational Biology on the development of bioinformatics activities in various countries, e.g., China [1], Australia [2], and Singapore [3], and the formation and successful development of the Polish Bioinformatics Society over the last five years, have inspired us to present a personal perspective on the advances of bioinformatics in Poland.
... Furthermore, computational biology is now considered an essential area of research for supporting to "-omics" and health research, as reported by scientists from Malaysia [43] and Singapore [44] and elsewhere. However, the situation is plagued by author ambiguity especially for Asian names, broken links for web tools, disappearing databases and inadequate disclosure, not enough for reproducibility. ...
Article
Full-text available
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.
Article
Full-text available
The amount of information being churned out by the field of biology has jumped manifold and now requires the extensive use of computers of the management of this information. The field of bioinformatics that addresses this need of biology has become an industry in its own right with the pharmaceutical and biotechnology industries being dependent on it for their growth. The bursting of the dotcom bubble in 2000 saw investors and venture capitalists flocking to the biotechnology industry in general and bioinformatics in particular. This work gives an analytical comparison of the bioinformatics industry in Malaysia and India. We examined government policy, education and economic aspects that are faced by the industry of each country. We also examined the difference in the development for the Bioinformatics industry between each country.
Article
Full-text available
Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.
Article
Full-text available
In this paper, we propose an ensemble of classifiers for biomedical named entity recognition in which three classifiers (one SVM and two HMMs) are combined effectively using a simple majority voting strategy. In addition, we incorporate an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module into the system to further improve the performance. Evaluation shows that our system achieves best performance (F-measure 82.58) on the closed test of the BioCreative protein/gene name recognition task (Task 1A).
Article
Full-text available
We survey the progress in the analysis of gene expression data for the purposes of disease subtype diagnosis, new sub- type discovery, and understanding of diseases and treatment responses. We find existing works fall short on several is- sues: these works provide little information on the inter- play between selected genes; the collection of pathways that can be used, evaluated, and ranked against the observed ex- pression data is limited; and a comprehensive set of rules for reasoning about relevant molecular events has not been compiled and formalized. We thus envision an advanced in- tegrated framework, and are developing a system based on it, to provide biologically inspired solutions. It comprises: (i) automated analysis and extraction of information from biomedical texts; (ii) targeted construction of known path- ways; and (iii) direct hypothesis generation based on logical reasoning on, and tests for, consistencies and inconsistencies of observed data against known pathways.
Article
Full-text available
In this work, we study the consequences of sequence variations of the "2009 H1N1" (swine or Mexican flu) influenza A virus strain neuraminidase for drug treatment and vaccination. We find that it is phylogenetically more closely related to European H1N1 swine flu and H5N1 avian flu rather than to the H1N1 counterparts in the Americas. Homology-based 3D structure modeling reveals that the novel mutations are preferentially located at the protein surface and do not interfere with the active site. The latter is the binding cavity for 3 currently used neuraminidase inhibitors: oseltamivir (Tamiflu®), zanamivir (Relenza®) and peramivir; thus, the drugs should remain effective for treatment. However, the antigenic regions of the neuraminidase relevant for vaccine development, serological typing and passive antibody treatment can differ from those of previous strains and already vary among patients. This article was reviewed by Sandor Pongor and L. Aravind.
Article
Full-text available
Function prediction of proteins with computational sequence analysis requires the use of dozens of prediction tools with a bewildering range of input and output formats. Each of these tools focuses on a narrow aspect and researchers are having difficulty obtaining an integrated picture. ANNIE is the result of years of close interaction between computational biologists and computer scientists and automates an essential part of this sequence analytic process. It brings together over 20 function prediction algorithms that have proven sufficiently reliable and indispensable in daily sequence analytic work and are meant to give scientists a quick overview of possible functional assignments of sequence segments in the query proteins. The results are displayed in an integrated manner using an innovative AJAX-based sequence viewer. ANNIE is available online at: http://annie.bii.a-star.edu.sg. This website is free and open to all users and there is no login requirement.
Article
Motivation: The current need for high-throughput protein interaction detection has resulted in interaction data being generated en masse through such experimental methods as yeast-two-hybrids and protein chips. Such data can be erroneous and they often do not provide adequate functional information for the detected interactions. Therefore, it is useful to develop an in silico approach to further validate and annotate the detected protein interactions. Results: Given that protein-protein interactions involve physical interactions between protein domains, domain-domain interaction information can be useful for validating, annotating, and even predicting protein interactions. However, large-scale, experimentally determined domain-domain interaction data do not exist. Here, we describe an integrative approach to computationally derive putative domain interactions from multiple data sources, including protein interactions, protein complexes, and Rosetta Stone sequences. We further prove the usefulness of such an integrative approach by applying the derived domain interactions to predict and validate protein-protein interactions.
Article
Singapore seeks to be an international center for the biomedical sciences and its related industries. Bioinformatics is seen as an important ingredient in this ambition. We provide in this short report a brief overview of bioinformatics in Singapore. We cover aspects such as training (Section 3), research (Section 4), and commercialization (Section 5). We also introduce some of the main centers of activities, as well as some of the bioinformaticists in these centers.
Article
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.
Article
One of the central problems in bioinformatics is data retrieval and integration. The existing biological databases are geographically distributed across the Internet, complex and heterogeneous in data types and data structures, and constantly changing. With the current rapid growth of biomedical data, the challenge is how large volumes of data retrieved from multiple databases can be transformed and integrated automatically and flexibly. This article describes a powerful new tool, the Kleisli system, for complex queries across multiple databases and data integration.
Article
Motivation: While processing of MHC class II antigens for presentation to helper T-cells is essential for normal immune response, it is also implicated in the pathogenesis of autoimmune disorders and hypersensitivity reactions. Sequence-based computational techniques for predicting HLA-DQ binding peptides have encountered limited success, with few prediction techniques developed using three-dimensional models. Methods: We describe a structure-based prediction model for modeling peptide-DQ3.2 beta complexes. We have developed a rapid and accurate protocol for docking candidate peptides into the DQ3.2 beta receptor and a scoring function to discriminate binders from the background. The scoring function was rigorously trained, tested and validated using experimentally verified DQ3.2 beta binding and non-binding peptides obtained from biochemical and functional studies. Results: Our model predicts DQ3.2 beta binding peptides with high accuracy [area under the receiver operating characteristic (ROC) curve A(ROC) > 0.90], compared with experimental data. We investigated the binding patterns of DQ3.2 beta peptides and illustrate that several registers exist within a candidate binding peptide. Further analysis reveals that peptides with multiple registers occur predominantly for high-affinity binders.
Article
The protein-protein subnetwork prediction challenge presented at the 2nd Dialogue for Reverse Engineering Assessments and Methods (DREAM2) conference is an important computational problem essential to proteomic research. Given a set of proteins from the Saccharomyces cerevisiae (baker's yeast) genome, the task is to rank all possible interactions between the proteins from the most likely to the least likely. To tackle this task, we adopt a graph-based strategy to combine multiple sources of biological data and computational predictions. Using training and testing sets extracted from existing yeast protein-protein interactions, we evaluate our method and show that it can produce better predictions than any of the individual data sources. This technique is then used to produce our entry for the protein-protein subnetwork prediction challenge.