The frontier of biological and medical sciences is full of opportunity today. It is widely appreciated that present-day biomedical researchers are confronted by vast amounts of data from genome sequencing; microscopy; high-throughput analytical techniques for DNA, RNA, and proteins; and a host of other new experimental technologies. Coupled with advances in computing power, this flow of information enables scientists to computationally model and analyze biological systems in novel ways. Therefore, bioinformatics is seen as an important ingredient in Singapore's ambition to be an international center for the biomedical sciences and their related industries. Five organizations are involved in bioinformatics in Singapore in a major way. Two of these are universities in Singapore, namely the National University of Singapore (NUS) and the Nanyang Technological University (NTU). NUS has a longer history in bioinformatics and life science training and research, while NTU did not have a life science school until the early 2000s. The other three are institutes under the Agency for Science Technology & Research (A*STAR), namely the BioInformatics Institute (BII), the Genome Institute of Singapore (GIS), and the Institute for Infocomm Research (I2R). I2R has the longest history in this field in Singapore, and it accounted for a lion's share of Singapore's output in bioinformatics research from 1994 to 2005. BII and GIS are entities set up in the early 2000s; they have now matured into major forces in bioinformatics research in Singapore. An earlier report describes the development and personalities of Singapore bioinformatics from 1992 to 2002 [1]. The bioinformatics scene in Singapore has undergone some important changes since 2005, with new leadership in three of the five major centers of activities in Singapore—BII, I2R, and NUS. Here, we provide an updated overview of bioinformatics research and training activities at these organizations, as well as at GIS and NTU.
Research at BII
BII ( of
A*STAR was originally founded in 2001.
After a tumultuous history with changing
missions and directors, BII was essentially
relaunched in the autumn of 2007. Now its
mission is defined primarily as a compu-
tational biology research institute. Its new
director, Frank Eisenhaber (previously at
the Research Institute of Molecular Pa-
thology in Vienna, Austria), guides the
BII sees its future as a center for
research in the field of biomolecular
mechanism exploration driven by compu-
tational biology. Thus, BII is meant to
remain primarily a theoretical institute.
But in contrast to the previous concept,
experimental work has a place at the
Institute both for the follow-up of theoret-
ically derived hypotheses and for the
generation of datasets that are important
for the development of theoretical ap-
proaches to biological problems.
The emphasis on biomolecular mecha-
nisms is guided both by fundamental and
by pragmatic considerations. Computa-
tional biology will have a great impact in
this area since the ever-increasing body of
sequence data, together with other large-
scale datasets on expression, structure,
interaction, and subcellular localization
of biomolecules, provide great opportuni-
ties for achieving new biological insight
using theoretical arguments. BII is located
in the Biopolis in the Buona Vista area of
Singapore and wishes to find synergies by
interacting with the community, especially
with other A*STAR biomedical research
institutes that concentrate on genomics
(GIS), molecular and cellular biology
(IMCB), as well as their context with
human disease (IMB, SiCS, SiGN), and
with biotechnology applications (BTI,
At present, BII hosts 11 independent
research teams organized into four research
divisions. The ‘‘Imaging Informatics’’ sec-
tion develops automated tools for the
quantification of the distribution of labeled
molecules with regard to subcellular struc-
tures in images of cells. BII’s own micros-
copy lab is coming into operation in
summer 2009. In the ‘‘Genome Sequence
and Gene Expression Data Analysis’’
division organized by Vladimir Kuznetsov,
the research focus is on understanding
transcriptional regulation and the biologi-
cal role of non-coding RNA. Chandra
Verma guides the ‘‘Biomolecular Structure
and Design’’ division, the teams of which
analyze and simulate 3D structural assem-
blies of biomolecules and try to connect
structural features with biological function.
Finally, the ‘‘Biomolecular Function
Discovery’’ unit is quite a unique setup
since it combines a protein sequence
analysis group with a biochemical labora-
tory for the verification of predicted gene
functions and a software team working on
the ANNOTATOR environment, a sys-
tem of workflows for annotating unchar-
acterized protein sequences.
Given that any really serious scientific
project takes a few years, time will tell
whether the promise of BII will be
realized. Nevertheless, several recent pub-
lications show a glimpse of BII’s opportu-
nities. For example, the mutations of the
neuraminidase from the 2009 H1N1
(swine flu) virus strain have been shown
not to affect the binding pocket of the
antiviral drugs oseltamivir (Tamiflu), za-
namivir (Relenza), and peramivir [2]. As
another example, the ANNIE software sets
a new standard in protein sequence
annotation and function prediction [3].
BII offers the opportunity for Ph.D.
students who are affiliated with any
university in the world (for their examina-
tions and their degree) to carry out
research on one of the teams and to
receive local monetary support over a
period of three years.
Research at I
R ( of
A*STAR is a research institute for infor-
mation technologies. As such, it devotes a
small amount of its resources to bioinfor-
matics, namely part of its data mining
department. The primary objective of the
bioinformatics research program at I
to inspire new research in data mining
through computational analysis of bio-
medical data. Since See-Kiong Ng took
over as manager of the data mining
department in 2006, the group has focused
on two areas, namely text mining and
graph mining, to address the computa-
tional challenges brought about by the
abundance of unstructured text and inter-
action networks in biology.
As one of the early pioneers in biomed-
ical text mining [4], I
R has been
developing effective text-mining approach-
es for extracting useful information from
the vast biomedical literature. The group
actively participates in international efforts
in this domain. For example, they are part
of the EU’s BOOTStrep (Bootstrapping
Of Ontologies and Terminologies STrate-
gic REsearch Project) program to develop
an integrated text analysis system for
biological documents, and they also col-
laborate with Tokyo University in devel-
oping a large-scale co-reference corpus on
Medline abstracts. The group’s text-min-
ing methods have been shown to be
among the best in international bench-
mark competitions such as BioCreAtIvE
For graph mining, the group has been
focusing on the analysis of whole-genome
protein–protein interaction networks, ad-
dressing such practical issues as handling
the high abundance of experimental errors
in the data and the effective integration of
domain knowledge in the analysis. Given
that biological systems are largely made up
of networks of molecular interactions,
developing data-mining methods to dis-
cover useful patterns from large networks
is essential for understanding how cellular
biology works, even though graph mining
is intrinsically challenging computational-
ly, with many problems proven to be NP-
hard. The group collaborates extensively
with local universities to develop algo-
rithms that can be applied to experimen-
tally determined protein–protein interac-
tion networks to discover new biological
knowledge such as domain interactions [6]
and protein complexes [7].
R also has an emphasis on applied
research. As such, the group is driven by
the need to apply the computational
methods developed to help biologists
deepen their understanding of molecular
biology, and to harvest the knowledge
gained to combat the many health threats
that Singapore faces today. One unique
biological application domain that the
bioinformatics group at I
R has been
focusing on is computational immunology.
This is particularly relevant to Singapore
given its recent close shaves with the
SARS and avian flu viruses, as well as
the emergence of tropical infectious dis-
eases such as Dengue and Chikungunya
fevers. With the alarming increase in
worldwide outbreaks in the last few years,
it is clearly also of great global concern. As
vaccination has been one of the most
successful public health intervention mea-
sures against infectious diseases, to gain a
fighting chance against these new health
threats it is crucial to significantly acceler-
ate the development of vaccines. The
group at I
R was one of the first to realize
that the recent advances in genomic,
proteomic, and bioinformatics technolo-
gies have offered new opportunities to do
so. They have been developing and
applying computational methods to screen
large sets of protein antigens, such as those
encoded by complete viral genomes [8],
and validating their computational results
by working closely with bench biologists
both locally and internationally. Thus far,
the group has worked on various viruses
such as Dengue, West Nile virus, Yellow
Fever virus, Human Influenza A, and
Chikungunya. In 2008, the current prin-
cipal investigator of the project, Joo
Chuan Tong, was selected as one of the
35 top innovators in science and technol-
ogy under the age of 35 by MIT’s
Technology Review magazine for his research
in ‘‘personalized vaccine design’’.
Research at GIS
GIS ( is an
A*STAR institute focused on genomic
research. GIS aims to have a deeper
understanding of cancer biology, stem cell
biology, molecular pharmacology, and
infectious disease through genomic study.
Bioinformatics is used as a tool to support
the associated high-throughput genomic
analyses. Roughly speaking, the bioinfor-
matics work at GIS can be divided into
three domains: sequence analysis, com-
parative genomics, and microarray study.
GIS has developed a series of high-
throughput DNA sequencing technologies
based on paired-end ditags (PET). These
technologies accelerate the understanding of
the dynamics and the structure of DNA
elements in our complex genome. A com-
putational sequence analysis pipeline is a
main vehicle for transforming raw sequenc-
ing data into meaningful information. Com-
bined with upstream bioinformatics analysis,
it leads to biological discovery. One example
is genome-wide fusion gene identification
using GIS-PET [9]. In GIS-PET analysis,
PETs from the two ends of each expressed
transcript (18 bp from the 59end and 18 bp
from the 39end) are extracted. Mapping the
PETs onto the reference genome gives the
precise transcript boundaries. However,
4%–5% of PETs still cannot be mapped.
These PETs may represent unconventional
transcripts such as fusion genes whose 59and
39ends may map on different chromosomes.
Through a novel clustering algorithm, 170
fusion gene candidates are identified.
Comparative genomic analysis is applied
at GIS to understand the genome rear-
rangement in cancer and the evolution of
regulatory sequences in our genome. For
example, analyzing several transcription
factors using ChIP-Seq technology [10]
showed that a large portion of binding sites
are embedded in repeats. More precisely,
those binding sites are located in distinctive
families of transposable elements. This
study indicates that transposable elements
play an important role in expanding the
repertoire of binding sites.
Microarray analysis is performed daily
at GIS for studying gene expression and
for diagnosis. In addition to routine
bioinformatics analysis, the groups at
GIS also develop new technology using
the microarray platform. For instance,
they developed the pathogen chip [11],
which detects the presence of viruses from
patient samples in an unbiased manner.
The major difficulty for virus detection is
how to amplify the complete genomes of
the viruses. Researchers at GIS proposed a
computational method that designs a
random primer that can amplify a selected
set of viruses efficiently.
In the future, bioinformatics will remain
a main weapon at GIS to understand the
mechanisms in our genome. Current work
includes understanding the chromatin
structure, deciphering the histone code
and the transcriptome map, and studying
genome rearrangement in cancer ge-
nomes. All these works rely heavily on
Training Program at NTU
There are two main formal bioinfor-
matics training programs in Singapore.
The first is a master’s program at NTU.
PLoS Computational Biology | 2 September 2009 | Volume 5 | Issue 9 | e1000508
The second is a bachelor’s program at
NUS (see next section). In this section we
describe the former, which is modeled
after the approach proposed in [12]. The
curriculum comprises a set of core bioin-
formatics courses that build upon the
contributing disciplines to present the
basic intellectual structure of the field.
The NTU bioinformatics program offers
a two-year part-time or one-year full-time
training leading to an M.Sc. degree. It is
designed for students who have relevant
scientific and technical backgrounds (engi-
neering or science degrees). The curricu-
lum provides them with skills for the
creation of excellent well-validated meth-
ods for solving problems in the domain of
bioinformatics and related fields.
The program gives students enough
time to learn about tool use and later on
tool development. Full-time students must
complete six core modules, two elective
modules, and a project to graduate, while
part-time students may complete addition-
al elective modules instead of the project.
The six core modules are: two biology
modules; an introductory bioinformatics
module, which trains students to be
proficient tool users; a statistics module;
and two modules on algorithms for
bioinformatics, which train students to
put together new efficient tools in addition
to being able to apply existing tools. After
taking all six core modules, the students
are expected to be proficient in imple-
menting, improving, and creating new
software tools and methods for analyzing
and organizing data. Once this core
foundation is laid, the students can move
on to select more current and diverse
topics in bioinformatics such as high-
performance computing for bioinformatics
and methods and tools for proteomics.
Due to the multidisciplinary nature of
the program, the teaching faculty is drawn
from the whole range of engineering and
science schools at NTU, such as the
School of Computer Engineering, the
School of Mechanical and Aerospace
Engineering, the School of Electrical and
Electronic Engineering, the School of
Chemical and Biomedical Engineering,
the National Institute of Education, and
the School of Biological Sciences. Further-
more, there are several adjunct faculty
members from GIS, I
R, BII, and the
National Cancer Centre who contribute
significantly in teaching and supervision.
Research and Training Program
at NUS
There are about twenty faculty mem-
bers at NUS who are involved in research
relating to bioinformatics to some extent.
Half of them are in the Computational
Biology Lab in the Department of Com-
puter Science (CBL, http://www.comp.,cbl), which has been coordi-
nated by Limsoon Wong since 2005. The
BioInformatics and Drug Design Group in
the Department of Pharmacy (BIDD,
htm), which has been led by Yuzong Chen
since 1997, is the second major center of
bioinformatics activities at NUS.
Research at CBL leads to fundamental
advances in knowledge discovery technol-
ogies, database technologies, combinatori-
al algorithms, and modeling and simula-
tion technologies, as well as in the
applications of these technologies to prob-
lems in biology and medicine. Research at
BIDD has as its main goals development
of computer-aided drug design methods
and software, development of bioinfor-
matics databases and software, and tool
development for and mechanistic study of
traditional Chinese medicine.
Some ongoing projects at NUS include
the following.
Gene Expression Analysis
Existing works on gene expression
analysis provide insufficient information
on the interplay between selected genes.
Also, the collection of pathways that can
be used, evaluated, and ranked against the
observed expression data is limited. Fur-
thermore, a comprehensive set of rules for
reasoning about relevant molecular events
has not been compiled and formalized. A
more advanced integrated framework to
provide biologically inspired solutions for
these challenges is envisioned in this
project [13].
Protein Complex Prediction
Protein–protein interaction (PPI) data
obtained by high-throughput assays con-
tain a high rate of errors. Thus it is
desirable to prioritize PPIs detected by
such high-throughput assays. Further-
more, PPI networks resulting from these
assays are essentially an in vitro scaffold.
Further progress in computational analysis
techniques and experimental methods is
needed to reliably deduce in vivo protein
interactions [14], to distinguish between
permanent and transient interactions, to
distinguish between direct protein binding
from membership in the same protein
complex [15], and to distinguish protein
complexes from functional modules. This
project aims to develop a system to process
results of high-throughput PPI assays, as
well as integrating extensive annotation
information, to yield a more informative
protein interactome.
Protein 3D Structure Analysis
The study of proteins from a structural
perspective gives more valuable information
about their functions. The two main objec-
tives in this project are to develop efficient
and effective methods to compare a pair of
3D protein structures [16] and to develop
efficient and effective methods to search a
database of 3D protein structures [17].
Functional Element Identification
Protein interactions with DNA and
RNA are the primary mechanisms for
controlling gene expression. What is
needed is a recognition code that maps
from the protein sequence to a pattern
that describes the family of DNA binding
sites—the functional elements. This pro-
ject develops methods for accurate identi-
fication of transcription factor binding sites
and also methods for inferring the inter-
actions of transcription factors and other
functional elements [18].
Protein Motion Simulation and
Many interesting properties of molecu-
lar motion are best-characterized statisti-
cally by considering an ensemble of
motion pathways rather than an individual
one. Classic simulation techniques, such as
the Monte Carlo method and molecular
dynamics, generate individual pathways
one at a time and are easily trapped in the
local minima of the energy landscape.
They are computationally inefficient if
applied in a brute-force fashion to deal
with many pathways. The project intro-
duces Stochastic Roadmap Simulation, a
randomized technique for sampling mo-
lecular motion and exploring the kinetics
of such motion by examining multiple
pathways simultaneously [19].
Computational Systems Biology
Computational systems biology involves
studying cellular functions and its compo-
nents at varying degrees of granularity.
These levels range from the nano-scale
molecular structures (atomic level) to entire
organs such as heart and lungs (phenotype
level). The project focus is mainly on the
functional aspects of cellular components,
in the form of biopathways. The team
hopes to develop a set of tools and modeling
methodology to produce accurate models
that can be validated and that can be used
to predict new phenomena [20].
In terms of training activities, NUS has
a bachelor’s program in bioinformatics,
PLoS Computational Biology | 3 September 2009 | Volume 5 | Issue 9 | e1000508
where science-based students receive a
B.Sc. (Bioinformatics) degree and comput-
ing-based students receive a B.Comp.
(Bioinformatics) degree. Both sets of stu-
dents share a core set of bioinformatics
courses and basic biology and computing
courses. The core bioinformatics courses
comprise the following chain of three
modules: 1) an introductory computational
biology module, which focuses on devel-
oping the understanding of bioinformatics
problems, the key principles for solving a
wide range of bioinformatics problems,
and the ability to interpret and analyze the
output of various tools and algorithms; 2) a
module on combinatorial methods in
bioinformatics, which introduces students
to combinatorial methods used frequently
in a range of bioinformatics problems such
as motif finding, population genetics,
genome annotation, and RNA structure;
and 3) a module on knowledge discovery
methods in bioinformatics, which intro-
duces students to data-mining algorithms
often used in a range of bioinformatics
problems such as gene expression profile
analysis and gene feature recognition.
After completing the basic courses and
the three core modules described above,
the program has a number of advanced
computational biology courses that can be
chosen as electives.
Concluding Remarks
As early as 1992, there were already
bioinformatics activities in Singapore cham-
pioned by Tin-Wee Tan at NUS. These
activities included mirroring of data collec-
tions and development of sequence analysis
applications. Bioinformatics activities in Sin-
gapore took on a deeper research character
when Limsoon Wong started work on the
Kleisli query system in 1994 [21]. This work
generated significant interest from several
large international pharmaceutical compa-
nies. This helped the Singapore Economic
Development Board become convinced to
fund, in 1996, a Bioinformatics Center at
NUS as a joint collaboration between the
activities of Tan and Wong. By 2000, the
potential of bioinformatics in modern bio-
medical research was fully recognized.
Therefore, A*STAR initiated significant
new funding to encourage and to support
research and development in this area. GIS
was established as the flagship organization
for high-throughput biological research in
Singapore. A year later, BII was established
to drive both bioinformatics training and
research. However, BII drifted in its twin
missions. NUS and NTU responded by
establishing proper degree programs in
bioinformatics in 2003 and 2002, respective-
ly, as well as by establishing more coordinat-
ed bioinformatics research programs in the
mid-2000s. In 2007, BII was relaunched
with research as its primary mission.
Today, the work of bioinformaticists
from Singapore are found in journals and
at conferences that are purely computer
science, purely biology, purely medicine,
as well as in the mainstream bioinfor-
matics journals. In fact, despite the small
size of her bioinformatics community
(,100), Singapore contributed 1.73% of
papers published in Bioinformatics since
2000. Furthermore, according to SCO-
PUS, these papers also account for 1.05%
of citations to Bioinformatics since 2000.
These data and the descriptions in the
preceding sections show that bioinfor-
matics activities in Singapore have grown
in diversity, intensity, and quality.
This healthy growth in research capa-
bility and government funding has helped
to attract international drug and life
sciences companies to Singapore. For
example, a significant portion of Eli Lilly’s
bioinformatics activities is now based at
the Lilly Singapore Centre for Drug
Discovery. The ease of recruiting well-
trained manpower is crucial to attracting
and maintaining such industry R&D
centers in Singapore. To groom truly
world-class Singaporean researchers, it is
important that they gain adequate over-
seas exposure as part of their training.
Because of the focus in research and
education in Singapore, many of our local
graduates are able to find offers for
doctorate and post-doctorate positions in
top universities and research centers
overseas. There are also ample govern-
ment sponsorships (e.g., A*STAR scholar-
ships) that provide financial support for
the local trainees to go overseas for their
doctoral and post-doctoral training. Those
who take up such sponsorships are re-
quired to return to Singapore after their
overseas stints, thereby providing a guar-
anteed pool of research talent in Singapore
to bolster local bioinformatics R&D. In
addition, we warmly welcome bioinforma-
ticists and computational biologists to
sg/,wongls/openings.html lists some of
the opportunities in Singapore.
