ArticlePDF Available

Brief Overview of Bioinformatics Activities in Singapore

PLOS Computational Biology

September 2009
5(9):e1000508

DOI:10.1371/journal.pcbi.1000508

Source
PubMed

License
CC BY 4.0

Authors:

Frank Eisenhaber

Agency for Science, Technology and Research (A*STAR)

Chee-Keong Kwoh

Nanyang Technological University

Show all 6 authorsHide

The frontier of biological and medical sciences is full of opportunity today. It is widely appreciated that present-day biomedical researchers are confronted by vast amounts of data from genome sequencing; microscopy; high-throughput analytical techniques for DNA, RNA, and proteins; and a host of other new experimental technologies. Coupled with advances in computing power, this flow of information enables scientists to computationally model and analyze biological systems in novel ways. Therefore, bioinformatics is seen as an important ingredient in Singapore's ambition to be an international center for the biomedical sciences and their related industries. Five organizations are involved in bioinformatics in Singapore in a major way. Two of these are universities in Singapore, namely the National University of Singapore (NUS) and the Nanyang Technological University (NTU). NUS has a longer history in bioinformatics and life science training and research, while NTU did not have a life science school until the early 2000s. The other three are institutes under the Agency for Science Technology & Research (A*STAR), namely the BioInformatics Institute (BII), the Genome Institute of Singapore (GIS), and the Institute for Infocomm Research (I2R). I2R has the longest history in this field in Singapore, and it accounted for a lion's share of Singapore's output in bioinformatics research from 1994 to 2005. BII and GIS are entities set up in the early 2000s; they have now matured into major forces in bioinformatics research in Singapore. An earlier report describes the development and personalities of Singapore bioinformatics from 1992 to 2002 [1]. The bioinformatics scene in Singapore has undergone some important changes since 2005, with new leadership in three of the five major centers of activities in Singapore—BII, I2R, and NUS. Here, we provide an updated overview of bioinformatics research and training activities at these organizations, as well as at GIS and NTU.

Content uploaded by Limsoon Wong

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Perspective

Brief Overview of Bioinformatics Activities in Singapore

Frank Eisenhaber

, Chee-Keong Kwoh

, See-Kiong Ng

, Wing-King Sung

4,5

, Limsoon Wong

1BioInformatics Institute, Singapore, 2Nanyang Technological University, Singapore, 3Institute for Infocomm Research, Singapore, 4Genome Institute of Singapore,

Singapore, 5National University of Singapore, Singapore

Introduction

The frontier of biological and medical

sciences is full of opportunity today. It is

widely appreciated that present-day biomed-

ical researchers are confronted by vast

amounts of data from genome sequencing;

microscopy; high-throughput analytical tech-

niques for DNA, RNA, and proteins; and a

host of other new experimental technologies.

Coupled with advances in computing power,

this flow of information enables scientists to

computationally model and analyze biolog-

ical systems in novel ways. Therefore,

bioinformatics is seen as an important

ingredient in Singapore’s ambition to be an

international center for the biomedical

sciences and their related industries.

Five organizations are involved in bioinfor-

matics in Singapore in a major way. Two of

these are universities in Singapore, namely

the National University of Singapore (NUS)

and the Nanyang Technological University

(NTU). NUS has a longer history in

bioinformatics and life science training and

research, while NTU did not have a life

science school until the early 2000s. The

other three are institutes under the Agency for

Science Technology & Research (A*STAR),

namely the BioInformatics Institute (BII), the

Genome Institute of Singapore (GIS), and the

Institute for Infocomm Research (I

R). I

has the longest history in this field in

Singapore, and it accounted for a lion’s share

of Singapore’s output in bioinformatics

research from 1994 to 2005. BII and GIS

are entities set up in the early 2000s; they

have now matured into major forces in

bioinformatics research in Singapore.

An earlier report describes the develop-

ment and personalities of Singapore bioin-

formatics from 1992 to 2002 [1]. The

bioinformatics scene in Singapore has

undergone some important changes since

2005, with new leadership in three of the five

major centers of activities in Singapore—

BII, I

R, and NUS. Here, we provide an

updated overview of bioinformatics research

and training activities at these organizations,

as well as at GIS and NTU.

Research at BII

BII (http://www.bii.a-star.edu.sg) of

A*STAR was originally founded in 2001.

After a tumultuous history with changing

missions and directors, BII was essentially

relaunched in the autumn of 2007. Now its

mission is defined primarily as a compu-

tational biology research institute. Its new

director, Frank Eisenhaber (previously at

the Research Institute of Molecular Pa-

thology in Vienna, Austria), guides the

transition.

BII sees its future as a center for

research in the field of biomolecular

mechanism exploration driven by compu-

tational biology. Thus, BII is meant to

remain primarily a theoretical institute.

But in contrast to the previous concept,

experimental work has a place at the

Institute both for the follow-up of theoret-

ically derived hypotheses and for the

generation of datasets that are important

for the development of theoretical ap-

proaches to biological problems.

The emphasis on biomolecular mecha-

nisms is guided both by fundamental and

by pragmatic considerations. Computa-

tional biology will have a great impact in

this area since the ever-increasing body of

sequence data, together with other large-

scale datasets on expression, structure,

interaction, and subcellular localization

of biomolecules, provide great opportuni-

ties for achieving new biological insight

using theoretical arguments. BII is located

in the Biopolis in the Buona Vista area of

Singapore and wishes to find synergies by

interacting with the community, especially

with other A*STAR biomedical research

institutes that concentrate on genomics

(GIS), molecular and cellular biology

(IMCB), as well as their context with

human disease (IMB, SiCS, SiGN), and

with biotechnology applications (BTI,

ETC, IBN).

At present, BII hosts 11 independent

research teams organized into four research

divisions. The ‘‘Imaging Informatics’’ sec-

tion develops automated tools for the

quantification of the distribution of labeled

molecules with regard to subcellular struc-

tures in images of cells. BII’s own micros-

copy lab is coming into operation in

summer 2009. In the ‘‘Genome Sequence

and Gene Expression Data Analysis’’

division organized by Vladimir Kuznetsov,

the research focus is on understanding

transcriptional regulation and the biologi-

cal role of non-coding RNA. Chandra

Verma guides the ‘‘Biomolecular Structure

and Design’’ division, the teams of which

analyze and simulate 3D structural assem-

blies of biomolecules and try to connect

structural features with biological function.

Finally, the ‘‘Biomolecular Function

Discovery’’ unit is quite a unique setup

since it combines a protein sequence

analysis group with a biochemical labora-

tory for the verification of predicted gene

functions and a software team working on

the ANNOTATOR environment, a sys-

tem of workflows for annotating unchar-

acterized protein sequences.

Given that any really serious scientific

project takes a few years, time will tell

whether the promise of BII will be

realized. Nevertheless, several recent pub-

lications show a glimpse of BII’s opportu-

nities. For example, the mutations of the

neuraminidase from the 2009 H1N1

(swine flu) virus strain have been shown

not to affect the binding pocket of the

antiviral drugs oseltamivir (Tamiflu), za-

namivir (Relenza), and peramivir [2]. As

another example, the ANNIE software sets

a new standard in protein sequence

annotation and function prediction [3].

BII offers the opportunity for Ph.D.

students who are affiliated with any

university in the world (for their examina-

tions and their degree) to carry out

Citation: Eisenhaber F, Kwoh C-K, Ng S-K, Sung W-K, Wong L (2009) Brief Overview of Bioinformatics Activities

in Singapore. PLoS Comput Biol 5(9): e1000508. doi:10.1371/journal.pcbi.1000508

Editor: Philip E. Bourne, University of California San Diego, United States of America

Published September 25, 2009

Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,

provided the original author and source are credited.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: wongls@comp.nus.edu.sg

PLoS Computational Biology | www.ploscompbiol.org 1 September 2009 | Volume 5 | Issue 9 | e1000508

research on one of the teams and to

receive local monetary support over a

period of three years.

Research at I

R (http://www.i2r.a-star.edu.sg) of

A*STAR is a research institute for infor-

mation technologies. As such, it devotes a

small amount of its resources to bioinfor-

matics, namely part of its data mining

department. The primary objective of the

bioinformatics research program at I

Ris

to inspire new research in data mining

through computational analysis of bio-

medical data. Since See-Kiong Ng took

over as manager of the data mining

department in 2006, the group has focused

on two areas, namely text mining and

graph mining, to address the computa-

tional challenges brought about by the

abundance of unstructured text and inter-

action networks in biology.

As one of the early pioneers in biomed-

ical text mining [4], I

R has been

developing effective text-mining approach-

es for extracting useful information from

the vast biomedical literature. The group

actively participates in international efforts

in this domain. For example, they are part

of the EU’s BOOTStrep (Bootstrapping

Of Ontologies and Terminologies STrate-

gic REsearch Project) program to develop

an integrated text analysis system for

biological documents, and they also col-

laborate with Tokyo University in devel-

oping a large-scale co-reference corpus on

Medline abstracts. The group’s text-min-

ing methods have been shown to be

among the best in international bench-

mark competitions such as BioCreAtIvE

[5].

For graph mining, the group has been

focusing on the analysis of whole-genome

protein–protein interaction networks, ad-

dressing such practical issues as handling

the high abundance of experimental errors

in the data and the effective integration of

domain knowledge in the analysis. Given

that biological systems are largely made up

of networks of molecular interactions,

developing data-mining methods to dis-

cover useful patterns from large networks

is essential for understanding how cellular

biology works, even though graph mining

is intrinsically challenging computational-

ly, with many problems proven to be NP-

hard. The group collaborates extensively

with local universities to develop algo-

rithms that can be applied to experimen-

tally determined protein–protein interac-

tion networks to discover new biological

knowledge such as domain interactions [6]

and protein complexes [7].

R also has an emphasis on applied

research. As such, the group is driven by

the need to apply the computational

methods developed to help biologists

deepen their understanding of molecular

biology, and to harvest the knowledge

gained to combat the many health threats

that Singapore faces today. One unique

biological application domain that the

bioinformatics group at I

R has been

focusing on is computational immunology.

This is particularly relevant to Singapore

given its recent close shaves with the

SARS and avian flu viruses, as well as

the emergence of tropical infectious dis-

eases such as Dengue and Chikungunya

fevers. With the alarming increase in

worldwide outbreaks in the last few years,

it is clearly also of great global concern. As

vaccination has been one of the most

successful public health intervention mea-

sures against infectious diseases, to gain a

fighting chance against these new health

threats it is crucial to significantly acceler-

ate the development of vaccines. The

group at I

R was one of the first to realize

that the recent advances in genomic,

proteomic, and bioinformatics technolo-

gies have offered new opportunities to do

so. They have been developing and

applying computational methods to screen

large sets of protein antigens, such as those

encoded by complete viral genomes [8],

and validating their computational results

by working closely with bench biologists

both locally and internationally. Thus far,

the group has worked on various viruses

such as Dengue, West Nile virus, Yellow

Fever virus, Human Influenza A, and

Chikungunya. In 2008, the current prin-

cipal investigator of the project, Joo

Chuan Tong, was selected as one of the

35 top innovators in science and technol-

ogy under the age of 35 by MIT’s

Technology Review magazine for his research

in ‘‘personalized vaccine design’’.

Research at GIS

GIS (http://www.gis.a-star.edu.sg) is an

A*STAR institute focused on genomic

research. GIS aims to have a deeper

understanding of cancer biology, stem cell

biology, molecular pharmacology, and

infectious disease through genomic study.

Bioinformatics is used as a tool to support

the associated high-throughput genomic

analyses. Roughly speaking, the bioinfor-

matics work at GIS can be divided into

three domains: sequence analysis, com-

parative genomics, and microarray study.

GIS has developed a series of high-

throughput DNA sequencing technologies

based on paired-end ditags (PET). These

technologies accelerate the understanding of

the dynamics and the structure of DNA

elements in our complex genome. A com-

putational sequence analysis pipeline is a

main vehicle for transforming raw sequenc-

ing data into meaningful information. Com-

bined with upstream bioinformatics analysis,

it leads to biological discovery. One example

is genome-wide fusion gene identification

using GIS-PET [9]. In GIS-PET analysis,

PETs from the two ends of each expressed

transcript (18 bp from the 59end and 18 bp

from the 39end) are extracted. Mapping the

PETs onto the reference genome gives the

precise transcript boundaries. However,

4%–5% of PETs still cannot be mapped.

These PETs may represent unconventional

transcripts such as fusion genes whose 59and

39ends may map on different chromosomes.

Through a novel clustering algorithm, 170

fusion gene candidates are identified.

Comparative genomic analysis is applied

at GIS to understand the genome rear-

rangement in cancer and the evolution of

regulatory sequences in our genome. For

example, analyzing several transcription

factors using ChIP-Seq technology [10]

showed that a large portion of binding sites

are embedded in repeats. More precisely,

those binding sites are located in distinctive

families of transposable elements. This

study indicates that transposable elements

play an important role in expanding the

repertoire of binding sites.

Microarray analysis is performed daily

at GIS for studying gene expression and

for diagnosis. In addition to routine

bioinformatics analysis, the groups at

GIS also develop new technology using

the microarray platform. For instance,

they developed the pathogen chip [11],

which detects the presence of viruses from

patient samples in an unbiased manner.

The major difficulty for virus detection is

how to amplify the complete genomes of

the viruses. Researchers at GIS proposed a

computational method that designs a

random primer that can amplify a selected

set of viruses efficiently.

In the future, bioinformatics will remain

a main weapon at GIS to understand the

mechanisms in our genome. Current work

includes understanding the chromatin

structure, deciphering the histone code

and the transcriptome map, and studying

genome rearrangement in cancer ge-

nomes. All these works rely heavily on

bioinformatics.

Training Program at NTU

There are two main formal bioinfor-

matics training programs in Singapore.

The first is a master’s program at NTU.

PLoS Computational Biology | www.ploscompbiol.org 2 September 2009 | Volume 5 | Issue 9 | e1000508

The second is a bachelor’s program at

NUS (see next section). In this section we

describe the former, which is modeled

after the approach proposed in [12]. The

curriculum comprises a set of core bioin-

formatics courses that build upon the

contributing disciplines to present the

basic intellectual structure of the field.

The NTU bioinformatics program offers

a two-year part-time or one-year full-time

training leading to an M.Sc. degree. It is

designed for students who have relevant

scientific and technical backgrounds (engi-

neering or science degrees). The curricu-

lum provides them with skills for the

creation of excellent well-validated meth-

ods for solving problems in the domain of

bioinformatics and related fields.

The program gives students enough

time to learn about tool use and later on

tool development. Full-time students must

complete six core modules, two elective

modules, and a project to graduate, while

part-time students may complete addition-

al elective modules instead of the project.

The six core modules are: two biology

modules; an introductory bioinformatics

module, which trains students to be

proficient tool users; a statistics module;

and two modules on algorithms for

bioinformatics, which train students to

put together new efficient tools in addition

to being able to apply existing tools. After

taking all six core modules, the students

are expected to be proficient in imple-

menting, improving, and creating new

software tools and methods for analyzing

and organizing data. Once this core

foundation is laid, the students can move

on to select more current and diverse

topics in bioinformatics such as high-

performance computing for bioinformatics

and methods and tools for proteomics.

Due to the multidisciplinary nature of

the program, the teaching faculty is drawn

from the whole range of engineering and

science schools at NTU, such as the

School of Computer Engineering, the

School of Mechanical and Aerospace

Engineering, the School of Electrical and

Electronic Engineering, the School of

Chemical and Biomedical Engineering,

the National Institute of Education, and

the School of Biological Sciences. Further-

more, there are several adjunct faculty

members from GIS, I

R, BII, and the

National Cancer Centre who contribute

significantly in teaching and supervision.

Research and Training Program

at NUS

There are about twenty faculty mem-

bers at NUS who are involved in research

relating to bioinformatics to some extent.

Half of them are in the Computational

Biology Lab in the Department of Com-

puter Science (CBL, http://www.comp.

nus.edu.sg/,cbl), which has been coordi-

nated by Limsoon Wong since 2005. The

BioInformatics and Drug Design Group in

the Department of Pharmacy (BIDD,

http://bidd.nus.edu.sg/group/research.

htm), which has been led by Yuzong Chen

since 1997, is the second major center of

bioinformatics activities at NUS.

Research at CBL leads to fundamental

advances in knowledge discovery technol-

ogies, database technologies, combinatori-

al algorithms, and modeling and simula-

tion technologies, as well as in the

applications of these technologies to prob-

lems in biology and medicine. Research at

BIDD has as its main goals development

of computer-aided drug design methods

and software, development of bioinfor-

matics databases and software, and tool

development for and mechanistic study of

traditional Chinese medicine.

Some ongoing projects at NUS include

the following.

Gene Expression Analysis

Existing works on gene expression

analysis provide insufficient information

on the interplay between selected genes.

Also, the collection of pathways that can

be used, evaluated, and ranked against the

observed expression data is limited. Fur-

thermore, a comprehensive set of rules for

reasoning about relevant molecular events

has not been compiled and formalized. A

more advanced integrated framework to

provide biologically inspired solutions for

these challenges is envisioned in this

project [13].

Protein Complex Prediction

Protein–protein interaction (PPI) data

obtained by high-throughput assays con-

tain a high rate of errors. Thus it is

desirable to prioritize PPIs detected by

such high-throughput assays. Further-

more, PPI networks resulting from these

assays are essentially an in vitro scaffold.

Further progress in computational analysis

techniques and experimental methods is

needed to reliably deduce in vivo protein

interactions [14], to distinguish between

permanent and transient interactions, to

distinguish between direct protein binding

from membership in the same protein

complex [15], and to distinguish protein

complexes from functional modules. This

project aims to develop a system to process

results of high-throughput PPI assays, as

well as integrating extensive annotation

information, to yield a more informative

protein interactome.

Protein 3D Structure Analysis

The study of proteins from a structural

perspective gives more valuable information

about their functions. The two main objec-

tives in this project are to develop efficient

and effective methods to compare a pair of

3D protein structures [16] and to develop

efficient and effective methods to search a

database of 3D protein structures [17].

Functional Element Identification

Protein interactions with DNA and

RNA are the primary mechanisms for

controlling gene expression. What is

needed is a recognition code that maps

from the protein sequence to a pattern

that describes the family of DNA binding

sites—the functional elements. This pro-

ject develops methods for accurate identi-

fication of transcription factor binding sites

and also methods for inferring the inter-

actions of transcription factors and other

functional elements [18].

Protein Motion Simulation and

Analysis

Many interesting properties of molecu-

lar motion are best-characterized statisti-

cally by considering an ensemble of

motion pathways rather than an individual

one. Classic simulation techniques, such as

the Monte Carlo method and molecular

dynamics, generate individual pathways

one at a time and are easily trapped in the

local minima of the energy landscape.

They are computationally inefficient if

applied in a brute-force fashion to deal

with many pathways. The project intro-

duces Stochastic Roadmap Simulation, a

randomized technique for sampling mo-

lecular motion and exploring the kinetics

of such motion by examining multiple

pathways simultaneously [19].

Computational Systems Biology

Computational systems biology involves

studying cellular functions and its compo-

nents at varying degrees of granularity.

These levels range from the nano-scale

molecular structures (atomic level) to entire

organs such as heart and lungs (phenotype

level). The project focus is mainly on the

functional aspects of cellular components,

in the form of biopathways. The team

hopes to develop a set of tools and modeling

methodology to produce accurate models

that can be validated and that can be used

to predict new phenomena [20].

In terms of training activities, NUS has

a bachelor’s program in bioinformatics,

PLoS Computational Biology | www.ploscompbiol.org 3 September 2009 | Volume 5 | Issue 9 | e1000508

where science-based students receive a

B.Sc. (Bioinformatics) degree and comput-

ing-based students receive a B.Comp.

(Bioinformatics) degree. Both sets of stu-

dents share a core set of bioinformatics

courses and basic biology and computing

courses. The core bioinformatics courses

comprise the following chain of three

modules: 1) an introductory computational

biology module, which focuses on devel-

oping the understanding of bioinformatics

problems, the key principles for solving a

wide range of bioinformatics problems,

and the ability to interpret and analyze the

output of various tools and algorithms; 2) a

module on combinatorial methods in

bioinformatics, which introduces students

to combinatorial methods used frequently

in a range of bioinformatics problems such

as motif finding, population genetics,

genome annotation, and RNA structure;

and 3) a module on knowledge discovery

methods in bioinformatics, which intro-

duces students to data-mining algorithms

often used in a range of bioinformatics

problems such as gene expression profile

analysis and gene feature recognition.

After completing the basic courses and

the three core modules described above,

the program has a number of advanced

computational biology courses that can be

chosen as electives.

Concluding Remarks

As early as 1992, there were already

bioinformatics activities in Singapore cham-

pioned by Tin-Wee Tan at NUS. These

activities included mirroring of data collec-

tions and development of sequence analysis

applications. Bioinformatics activities in Sin-

gapore took on a deeper research character

when Limsoon Wong started work on the

Kleisli query system in 1994 [21]. This work

generated significant interest from several

large international pharmaceutical compa-

nies. This helped the Singapore Economic

Development Board become convinced to

fund, in 1996, a Bioinformatics Center at

NUS as a joint collaboration between the

activities of Tan and Wong. By 2000, the

potential of bioinformatics in modern bio-

medical research was fully recognized.

Therefore, A*STAR initiated significant

new funding to encourage and to support

research and development in this area. GIS

was established as the flagship organization

for high-throughput biological research in

Singapore. A year later, BII was established

to drive both bioinformatics training and

research. However, BII drifted in its twin

missions. NUS and NTU responded by

establishing proper degree programs in

bioinformatics in 2003 and 2002, respective-

ly, as well as by establishing more coordinat-

ed bioinformatics research programs in the

mid-2000s. In 2007, BII was relaunched

with research as its primary mission.

Today, the work of bioinformaticists

from Singapore are found in journals and

at conferences that are purely computer

science, purely biology, purely medicine,

as well as in the mainstream bioinfor-

matics journals. In fact, despite the small

size of her bioinformatics community

(,100), Singapore contributed 1.73% of

papers published in Bioinformatics since

2000. Furthermore, according to SCO-

PUS, these papers also account for 1.05%

of citations to Bioinformatics since 2000.

These data and the descriptions in the

preceding sections show that bioinfor-

matics activities in Singapore have grown

in diversity, intensity, and quality.

This healthy growth in research capa-

bility and government funding has helped

to attract international drug and life

sciences companies to Singapore. For

example, a significant portion of Eli Lilly’s

bioinformatics activities is now based at

the Lilly Singapore Centre for Drug

Discovery. The ease of recruiting well-

trained manpower is crucial to attracting

and maintaining such industry R&D

centers in Singapore. To groom truly

world-class Singaporean researchers, it is

important that they gain adequate over-

seas exposure as part of their training.

Because of the focus in research and

education in Singapore, many of our local

graduates are able to find offers for

doctorate and post-doctorate positions in

top universities and research centers

overseas. There are also ample govern-

ment sponsorships (e.g., A*STAR scholar-

ships) that provide financial support for

the local trainees to go overseas for their

doctoral and post-doctoral training. Those

who take up such sponsorships are re-

quired to return to Singapore after their

overseas stints, thereby providing a guar-

anteed pool of research talent in Singapore

to bolster local bioinformatics R&D. In

addition, we warmly welcome bioinforma-

ticists and computational biologists to

Singapore—http://www.comp.nus.edu.

sg/,wongls/openings.html lists some of

the opportunities in Singapore.

References

1. Wong L (2003) Bioinformatics in Singapore. Asia

Pacific Biotech News 7: 88–92.

2. Maurer-Stroh S, Ma J, Lee RTC, Sirota FL,

Eisenhaber F (2009) Mapping the sequence

mutations of the 2009 H1N1 influenza A virus

neuraminidase relative to drug and antibody

binding sites. Biol Direct 4: 18.

3. Ooi HS, Kwo CY, Wildpaner M, Sirota FL,

Eisenhaber B, et al. (2009) ANNIE: Integrated de

novo protein sequence annotation. Nucleic Acids

Res 37: W435–W440.

4. Ng SK, Wong M (1999) Toward routine

automatic pathway discovery from on-line scien-

tific text abstracts. Genome Inform 10: 104–112.

5. Zhou GD, Shen D, Zhang J, Su J, Tan SH (2005)

Recognition of protein and gene names from text

using an ensemble of classifiers and effective

abbreviationresolution. BMC Bioinformatics 6: S7.

6. Ng SK, Zhang Z, Tan SH (2003) Integrative

approach for computationally inferring protein

domain interactions. Bioinformatics 19: 923–929.

7. Li XL, Tan SH, Foo CS, Ng SK (2005) Interaction

graph mining for protein complexes using local

clique merging. Genome Inform 16: 260–269.

8. Tong JC, Zhang GL, Tan TW, August JT,

Brusic V, et al. (2006) Prediction of HLA-DQ3.2

ligands: Evidence of multiple registers in class II

binding peptides. Bioinformatics 22: 1232–1238.

9. Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD,

et al. (2007) Fusion transcripts and transcribed

retrotransposed loci discovered through compre-

hensive transcriptome analysis using Paired-End

diTags (PETs). Genome Res 17: 828–838.

10. Bourque G, Leong B, Vega VB, Chen X, Lee YL,

et al. (2008) Evolution of the mammalian

transcription factor binding repertoire via trans-

posable elements. Genome Res 18: 1752–1762.

11. Wong CW, Lee CWH, Leong WY, Soh SWL,

Kartasasmita CB, et al. (2007) Optimization and

clinical validation of a pathogen detection micro-

array. Genome Biol 8: R93.

12. Altman RB (1998) A curriculum for bioinfor-

matics: The time is ripe. Bioinformatics 14:

549–550.

13. Soh D, Dong D, Guo Y, Wong L (2007) Enabling

more sophisticated gene expression analysis for

understanding diseases and optimizing treat-

ments. ACM SIGKDD Explorations 9: 3–14.

14. Chua HN, Hugo W, Liu G, Li XL, Wong L, et al.

(2009) A probabilistic graph-theoretic approach

to integrate multiple predictions for the protein-

protein subnetwork prediction challenge.

Ann N Y Acad Sci 1158: 224–233.

15. Chua HN, Ning K, Sung WK, Leong HW,

Wong L (2008) Using indirect protein-protein

interactions for protein complex prediction.

J Bioinform Comput Biol 6: 435–466.

16. Aung Z, Tan KL (2006) MatAlign: Precise

protein structure comparison by matrix align-

ment. J Bioinform Comput Biol 4: 1197–1216.

17. Aung Z, Tan SH, Ng SK, Tan KL (2008)

PPiClust: Efficient clustering of 3D protein-

protein interaction interfaces. J Bioinform Com-

put Biol 6: 415–433.

18. Wijaya E, Yiu SM, Son NT, Kanagasabai R,

Sung WK (2008) MotifVoter: A novel ensemble

method for fine-grained integration of generic

motif finders. Bioinformatics 24: 2288–2295.

19. Chiang TH, Apaydin MS, Brutlag DL, Hsu D,

Latombe JC (2007) Using stochastic roadmap

simulation to predict experimental quantities in

protein folding kinetics: Folding rates and phiva-

lue. J Comput Biol 14: 578–593.

20. Koh G, Teong HFC, Clement MV, Hsu D,

Thiagarajan PS (2006) A decompositional ap-

proach to parameter estimation in pathway

modeling: A case study of the Akt and MAPK

pathways and their crosstalk. Bioinformatics 22:

e271–e280.

21. Chung SY, Wong L (1999) Kleisli, a new tool for

data integration in biology. Trends Biotechnol 17:

351–355.

PLoS Computational Biology | www.ploscompbiol.org 4 September 2009 | Volume 5 | Issue 9 | e1000508

Designing and running an advanced Bioinformatics and genome analyses course in Tunisia

Article

Full-text available

Jan 2019

Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants’ skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.

Establishment of computational biology in Greece and Cyprus: Past, present, and future

Article

Full-text available

Dec 2019
PLOS COMPUT BIOL

We review the establishment of computational biology in Greece and Cyprus from its inception to date and issue recommendations for future development. We compare output to other countries of similar geography, economy, and size—based on publication counts recorded in the literature—and predict future growth based on those counts as well as national priority areas. Our analysis may be pertinent to wider national or regional communities with challenges and opportunities emerging from the rapid expansion of the field and related industries. Our recommendations suggest a 2-fold growth margin for the 2 countries, as a realistic expectation for further expansion of the field and the development of a credible roadmap of national priorities, both in terms of research and infrastructure funding. This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

10 years for the Journal of Bioinformatics and Computational Biology (2003-2013) - A retrospective

Article

Full-text available

Jun 2014

The Journal of Bioinformatics and Computational Biology (JBCB) started publishing scientific articles in 2003. It has established itself as home for solid research articles in the field (~ 60 per year) that are surprisingly well cited. JBCB has an important function as alternative publishing channel in addition to other, bigger journals.

APBioNet-Transforming Bioinformatics in the Asia-Pacific Region

Article

Full-text available

Oct 2013
PLOS COMPUT BIOL

The Asia-Pacific Bioinformatics Network (APBioNet; www.apbionet.org) is a nonprofit, nongovernmental, international organization founded in 1998 that focuses on the promotion of bioinformatics in the Asia-Pacific region. APBioNet's mission, since its inception, has been to pioneer the growth and development of bioinformatics awareness, training, education, infrastructure, resources, and research among member countries and economies. Its work includes technical coordination, liaison, and/or affiliation with other international scientific bodies, such as the European Molecular Biology network (EMBnet) and the International Society for Computational Biology (ISCB). APBioNet has more than 20 organizational and 2,000 individual members from over 12 countries in the region, from industry, academia, research, government, investors, and international organizations. APBioNet is spearheading a number of key bioinformatics initiatives in collaboration with international organizations, such as the Asia-Pacific Advanced Network (APAN), the Association of South-East Asian Nations (ASEAN), the Asia-Pacific Economic Cooperation (APEC), and the Asia-Pacific International Molecular Biology Network (A-IMBN), and industry partners. Many of the initiatives and activities have been initiated through its flagship conference, the International Conference on Bioinformatics (InCoB). In 2012, APBioNet was incorporated in Singapore as a public limited liability company to ensure quality, sustainability, and continuity of its mission to advance bioinformatics across the region and beyond. We describe below the key thrust areas of APBioNet.

Bioinformatics and Computational Biology in Poland

Article

Full-text available

May 2013
PLOS COMPUT BIOL

The series of articles in PLOS Computational Biology on the development of bioinformatics activities in various countries, e.g., China [1], Australia [2], and Singapore [3], and the formation and successful development of the Polish Bioinformatics Society over the last five years, have inspired us to present a personal perspective on the advances of bioinformatics in Poland.

Extending Asia Pacific bioinformatics into new realms in the

Article

Full-text available

Dec 2009
BMC GENOMICS

The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.

Comparison of biometrics industry between Malaysia and India: an overview

Article

Full-text available

Jun 2011

The amount of information being churned out by the field of biology has jumped manifold and now requires the extensive use of computers of the management of this information. The field of bioinformatics that addresses this need of biology has become an industry in its own right with the pharmaceutical and biotechnology industries being dependent on it for their growth. The bursting of the dotcom bubble in 2000 saw investors and venture capitalists flocking to the biotechnology industry in general and bioinformatics in particular. This work gives an analytical comparison of the bioinformatics industry in Malaysia and India. We examined government policy, education and economic aspects that are faced by the industry of each country. We also examined the difference in the development for the Bioinformatics industry between each country.

Unix interfaces, Kleisli, bucandin structure, etc. - The heroic beginning of bioinformatics in Singapore

Article

Full-text available

Jun 2014

Frank Eisenhaber

Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.

Recognition of Protein/Gene Names from Text using an Ensemble of Classifiers and Effective Abbreviation Detection

Article

Full-text available

Jan 2004
BMC BIOINFORMATICS

In this paper, we propose an ensemble of classifiers for biomedical named entity recognition in which three classifiers (one SVM and two HMMs) are combined effectively using a simple majority voting strategy. In addition, we incorporate an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module into the system to further improve the performance. Evaluation shows that our system achieves best performance (F-measure 82.58) on the closed test of the BioCreative protein/gene name recognition task (Task 1A).

Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments

Article

Full-text available

Jun 2007

We survey the progress in the analysis of gene expression data for the purposes of disease subtype diagnosis, new sub- type discovery, and understanding of diseases and treatment responses. We find existing works fall short on several is- sues: these works provide little information on the inter- play between selected genes; the collection of pathways that can be used, evaluated, and ranked against the observed ex- pression data is limited; and a comprehensive set of rules for reasoning about relevant molecular events has not been compiled and formalized. We thus envision an advanced in- tegrated framework, and are developing a system based on it, to provide biologically inspired solutions. It comprises: (i) automated analysis and extraction of information from biomedical texts; (ii) targeted construction of known path- ways; and (iii) direct hypothesis generation based on logical reasoning on, and tests for, consistencies and inconsistencies of observed data against known pathways.

Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding sites

Article

Full-text available

Jun 2009
BIOL DIRECT

In this work, we study the consequences of sequence variations of the "2009 H1N1" (swine or Mexican flu) influenza A virus strain neuraminidase for drug treatment and vaccination. We find that it is phylogenetically more closely related to European H1N1 swine flu and H5N1 avian flu rather than to the H1N1 counterparts in the Americas. Homology-based 3D structure modeling reveals that the novel mutations are preferentially located at the protein surface and do not interfere with the active site. The latter is the binding cavity for 3 currently used neuraminidase inhibitors: oseltamivir (Tamiflu®), zanamivir (Relenza®) and peramivir; thus, the drugs should remain effective for treatment. However, the antigenic regions of the neuraminidase relevant for vaccine development, serological typing and passive antibody treatment can differ from those of previous strains and already vary among patients. This article was reviewed by Sandor Pongor and L. Aravind.

ANNIE: Integrated de novo protein sequence annotation

Article

Full-text available

May 2009
NUCLEIC ACIDS RES

Function prediction of proteins with computational sequence analysis requires the use of dozens of prediction tools with a bewildering range of input and output formats. Each of these tools focuses on a narrow aspect and researchers are having difficulty obtaining an integrated picture. ANNIE is the result of years of close interaction between computational biologists and computer scientists and automates an essential part of this sequence analytic process. It brings together over 20 function prediction algorithms that have proven sufficiently reliable and indispensable in daily sequence analytic work and are meant to give scientists a quick overview of possible functional assignments of sequence segments in the query proteins. The results are displayed in an integrated manner using an innovative AJAX-based sequence viewer. ANNIE is available online at: http://annie.bii.a-star.edu.sg. This website is free and open to all users and there is no login requirement.

Integrative approach for computationally inferring protein domain interactions

Article

May 2003

Motivation: The current need for high-throughput protein interaction detection has resulted in interaction data being generated en masse through such experimental methods as yeast-two-hybrids and protein chips. Such data can be erroneous and they often do not provide adequate functional information for the detected interactions. Therefore, it is useful to develop an in silico approach to further validate and annotate the detected protein interactions. Results: Given that protein-protein interactions involve physical interactions between protein domains, domain-domain interaction information can be useful for validating, annotating, and even predicting protein interactions. However, large-scale, experimentally determined domain-domain interaction data do not exist. Here, we describe an integrative approach to computationally derive putative domain interactions from multiple data sources, including protein interactions, protein complexes, and Rosetta Stone sequences. We further prove the usefulness of such an integrative approach by applying the derived domain interactions to predict and validate protein-protein interactions.

Bioinfomatics in Singapore

Article

Feb 2003

Limsoon Wong

Singapore seeks to be an international center for the biomedical sciences and its related industries. Bioinformatics is seen as an important ingredient in this ambition. We provide in this short report a brief overview of bioinformatics in Singapore. We cover aspects such as training (Section 3), research (Section 4), and commercialization (Section 5). We also introduce some of the main centers of activities, as well as some of the bioinformaticists in these centers.

Using Indirect Protein-Protein Interaction for Protein Complex Prediction

Article

Jun 2008

Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.

Kleisli: A New Tool for Data Integration in Biology

Article

Oct 1999
TRENDS BIOTECHNOL

One of the central problems in bioinformatics is data retrieval and integration. The existing biological databases are geographically distributed across the Internet, complex and heterogeneous in data types and data structures, and constantly changing. With the current rapid growth of biomedical data, the challenge is how large volumes of data retrieved from multiple databases can be transformed and integrated automatically and flexibly. This article describes a powerful new tool, the Kleisli system, for complex queries across multiple databases and data integration.

Prediction of HLA-DQ3.2 beta ligands: evidence of multiple registers in class II binding peptides

Article

J. C. Tong

Motivation: While processing of MHC class II antigens for presentation to helper T-cells is essential for normal immune response, it is also implicated in the pathogenesis of autoimmune disorders and hypersensitivity reactions. Sequence-based computational techniques for predicting HLA-DQ binding peptides have encountered limited success, with few prediction techniques developed using three-dimensional models. Methods: We describe a structure-based prediction model for modeling peptide-DQ3.2 beta complexes. We have developed a rapid and accurate protocol for docking candidate peptides into the DQ3.2 beta receptor and a scoring function to discriminate binders from the background. The scoring function was rigorously trained, tested and validated using experimentally verified DQ3.2 beta binding and non-binding peptides obtained from biochemical and functional studies. Results: Our model predicts DQ3.2 beta binding peptides with high accuracy [area under the receiver operating characteristic (ROC) curve A(ROC) > 0.90], compared with experimental data. We investigated the binding patterns of DQ3.2 beta peptides and illustrate that several registers exist within a candidate binding peptide. Further analysis reveals that peptides with multiple registers occur predominantly for high-affinity binders.

A Probabilistic Graph-Theoretic Approach to Integrate Multiple Predictions for the Protein-Protein Subnetwork Prediction Challenge

Article

Apr 2009
ANN NY ACAD SCI

The protein-protein subnetwork prediction challenge presented at the 2nd Dialogue for Reverse Engineering Assessments and Methods (DREAM2) conference is an important computational problem essential to proteomic research. Given a set of proteins from the Saccharomyces cerevisiae (baker's yeast) genome, the task is to rank all possible interactions between the proteins from the most likely to the least likely. To tackle this task, we adopt a graph-based strategy to combine multiple sources of biological data and computational predictions. Using training and testing sets extracted from existing yeast protein-protein interactions, we evaluate our method and show that it can produce better predictions than any of the individual data sources. This technique is then used to produce our entry for the protein-protein subnetwork prediction challenge.

Brief Overview of Bioinformatics Activities in Singapore

Abstract

Recommended publications

Mechanisms of Gene Regulation

Interdisciplinary Research and Education at the Biology-Engineering-Computer Science Interface: A Pe...

Mechanisms of Gene Regulation

Bioethics in Singapore: The Ethical Microcosm