Conference PaperPDF Available

A tool for structure alignment of molecules

January 2005

January 2005

DOI:10.1109/MMSE.2004.19

Source
IEEE Xplore

Conference: Multimedia Software Engineering, 2004. Proceedings. IEEE Sixth International Symposium on

Authors:

Ming Ouhyoung

National Taiwan University

In this paper, a novel tool is proposed to align two molecules (not just proteins) based on their 3D structural data, and the user can observe the result of alignment visually via the tool. Most existing tools are designed only for alignment of proteins. Here, a new tool is developed to address shared structural features between protein structures and tRNA structures, that is, molecular mimicry, although they are two very different types of molecules. In order to align two molecules A and B, geometric hashing is applied to globally find initial matching of approximately overlapped atoms, thus parts of molecule A can be matched to parts of molecule B. Next, a fine tuning process is introduced, which is based on local optimization of overlapped parts, and the iterative closest point (ICP) is used until the number of overlapped atoms within a given distance threshold can not be increased any more. The results show that our method is useful to structurally align two molecules, not restricted to align two proteins only. Besides, our tool outperforms in terms of RMSD and number of matched atom pairs in comparison to other tools.

The flow chart for fine tuning process.

…

EFG vs. EF-tu/tRNA complex (Nissen et. al 1995 shows that the binding to ribosome is at the same place and orient- tation.) This picture is from Professor Laura Landweber’s group of Ecology and Evolutionary Biology Dept. Princeton University, and the orientation is manually selected.

…

RRF vs tRNA (Selmer et. al 1999 shows that the binding to ribosome is at different place and orientation.), and again the orientation is manually selected.

…

Alignment of two molecules using our tool for EFG vs. EF-tu/tRNA complex, where the atom number is over 4000 and the computation time is about 36 hours on a Pentium-4 3GHz PC.

…

Alignment of two molecules using our tool for RRF vs. tRNA, where the atom number is over 1000 and the computation time is about 24 minutes on a Pentium-4 3GHz PC.

…

Figures - uploaded by Ming Ouhyoung

Content may be subject to copyright.

Content uploaded by Ming Ouhyoung

Content may be subject to copyright.

A Tool for Structure Alignment of Molecules

Pei-Ken Chang, Chien-Cheng Chen and Ming Ouhyoung

Department of Computer Science and Information Engineering, National Taiwan University

{zick, ccchen}@cmlab.csie.ntu.edu.tw, ming@csie.ntu.edu.tw

Abstract

In this paper, a novel tool is proposed to align two

molecules (not just proteins) based on their 3D

structural data, and the user can observe the result of

alignment visually via the tool. Most existing tools are

designed only for alignment of proteins. Here, a new

tool is developed to address shared structural features

between protein structures and tRNA structures, that is,

molecular mimicry, although they are two very

different types of molecules.

In order to align two molecules A and B, Geometric

Hashing is applied to globally find initial matching of

approximately overlapped atoms, thus parts of

molecule A can be matched to parts of molecule B.

Next, a fine tuning process is introduced, which is

based on local optimization of overlapped parts, and

the Iterative Closest Point (ICP) is used until the

number of overlapped atoms within a given distance

threshold can not be increased any more. The results

show that our method is useful to structurally align

two molecules, not restricted to align two proteins only.

Besides, our tool outperforms in terms of RMSD and

number of matched atom pairs in comparison to other

tools.

1. Introduction

Search engines for 3D models have been developed

in recent years [1] [2], however, can similar techniques

be used in molecules? If so, the benefit can be great.

The reason is that large number of protein structures

can be determined by high throughput machines,

classifying proteins into families and assigning

functions to those novel proteins become major tasks

in recent years. The Protein Data Bank (PDB)

currently contains more than 25,000 structures and it is

estimated that the number of structures in the PDB

may exceed 35,000 by 2005. Though proteins have

been grouped together on the basis of structural

similarities in the FSSP [3], CATH [4], and SCOP

databases [5], much effort still has been put into

finding the similarities among proteins. Moreover, the

rapid growth in the amount of structural data of

proteins far exceeds the ability of experimental

techniques to identify the locations and key amino

acids of active sites. Although the structural genomics

initiative (SGI) proposes to solve 10,000 protein 3D

structures in this decade, however, many biological

functions still remain unknown.

With the help of alignment tools, the structural

similarity between proteins is revealed, as well as the

functional and evolutional relationships. Holm and

Sander [6] mentioned that structural similarities among

distantly related proteins are often preserved in the

process of evolution, but very little similarity at the

sequence level.

There is an interesting problem studied, that is

molecular mimicry. The molecular mimicry problem [7]

is that a protein and a nucleic acid share a similar

substructure, and sometimes it will even extend to

similarity in interaction. Nissen et. al [8] indicated that

the structure of Elongation Factor-G is similar to that

of the complex of Elongation Factor-Tu and tRNA.

Selmer et. al [9] mentioned that Ribosomal Recycling

Factor looks like tRNA. In addition, exploitation of 3D

structural data is a key factor to enhance structure-

based drug design (SBDD), and the prediction of

protein functions and possible active sites in proteins

have become quite popular in SBDD, especially at

front-ends to molecular docking [10] [11] or

alternative active sites are sought otherwise.

This paper is organized as follows. Some related

works are discussed in section 2. The geometric

hashing algorithm and ICP algorithm we use are

detailed in section 3. The experimental results are

provided in section 4 while conclusion is given in

section 5.

2. Previous Work

In general, structure alignment based on 3D

structure has been shown to be NP complete by

Lathrop [12] and so heuristics are used to simplify the

problem. Therefore, better methods for structure

alignment are needed. Fisher et al. [13] used geometric

hashing for a C

-only representation of protein

structure, and a follow-up is described in Tsai et al.

[14]. Their method is based on preprocessing and

recognition algorithms of complexity O(n

), where n is

the number of residues of interest. Later, Pennec and

Ayache [15] [16] introduced a 3D reference frame

attached to each residue, which reduces the complexity

of recognition to O(n

). Shindyalov and Bourne [17]

proposed a method that involves a combinatorial

extension (CE) of an alignment path defined by

aligned fragment pairs (AFPs) rather than the more

conventional techniques which use dynamic

programming and Monte Carlo optimization.

Combinations of AFPs that represent possible

continuous alignment paths are selectively extended or

discarded thereby leading to a single optimal alignment.

Zemla [18] proposed LGA (local-global alignment)

algorithm, where longest continuous sequence is first

found, and then a second step called GDT (global

distance test) is applied. Both longest segment of

residues under selected RMSD (root mean square

distance) and largest set of equivalent residues that

deviate less than a given distance threshold are

obtained. Blankenbecler et al. [19] proposed to use

fuzzy alignment variables and iterative minimization of

a cost function. Milik et al. [20] used graph matching

and represented atoms as nodes and bond distance as

edge labels. The search method is based on

comparison of local structure features of proteins that

share a common biochemical function, and so does not

depend on overall similarity of structures and

sequences of compared proteins.

From the above survey, it is clear that all the above

papers are concerned with proteins, and complexity

reduction in alignment according to features of

proteins or segments of aligned one dimensional

sequence. Therefore, they can not solve the general

molecule alignment problem unless the tools are

modified.

3. Algorithms

In this paper, we propose a tool to align two

molecules based on their 3D structural data. The

alignment problem between two molecules A and B is

solved in two steps: Geometric Hashing and a fine

tuning process. Geometric Hashing globally finds

initial matching of approximately overlapped atoms.

Thus, parts of molecule A can be matched to parts of

molecule B. Secondly, the fine tuning process is based

on local optimization of overlapped parts, and the

Iterative Closest Point (ICP) algorithm is used until the

number of overlapped atoms within a given distance

threshold can not be increased any more.

3.1. Geometric Hashing: Step One

Geometric hashing algorithm is introduced to

structurally align two molecules. Geometric hashing

algorithm is a technique originally developed in

computer vision for object recognition and can easily

be made parallel [21] [22]. In short, the geometric

hashing algorithm is composed of two stages:

preprocessing and recognition. The basic idea is to

store in a database at preprocessing time a redundant

representation of the models by rigid transformation.

By doing so, the representation of the query object

processed at recognition time will present some

similarities with that of some database models.

Matching is possible even when the recognizable

database objects have undergone transformations or

when only partial information is present.

Often the two interesting molecules are both

proteins, so we will illustrate the solution in such a

situation first. For some cases, e.g. molecular mimicry,

two molecules belong to different type, there would be

some variance while calculating, and we will describe

later.

The three atoms N, C

and C in each amino acid

form a triangle which uniquely defines the position and

orientation of the amino acid in the three-dimensional

structure of a protein. Since the length of N−C

and

−C are fixed, and N−C

−C bond angle is also

changeless. As alignment considered, the

correspondence between two triplets of points in three-

dimensional space is sufficient to uniquely determine a

rigid transformation. With this mechanism, we can

choose a single residue as a basis. A basis is calculated

by the following steps and illustrated in Figure 1(a).

1. Normalize

JJJJK

eCC

K JJJJK

321

eee

JKJJKJK

There are two phases, preprocessing and

recognition, in the geometric hashing algorithm. To

solve the problem of representation by different

reference coordinates, coordinate information based on

different reference frame of a model is encoded in the

preprocessing phase and stored in a large memory, in

this case, a hash table. The contents of the hash table

are independent of the scene and thus can be computed

offline to reduce the time needed for recognition.

Accessing to the memory is based on geometric

information that is invariant of the object’s pose and

computed directly from the scene. During the

recognition phase, the method accesses the previously

constructed hash table using the indices of the encoded

coordinate information of the input object and finds

their common spatial features.

In the phase of preprocessing, we calculate one

basis for each residue to generate coordinates for each

atom in a protein. In the phase of recognition, we

choose a reference frame of the protein B. For each

different reference frame of protein A in the hash table,

we accumulate the number of matched atoms by

checking whether there are two atoms close enough.

We set a threshold distance MatchThres (MatchThres

= 1 to 2Å is proper), beyond which atoms will not be

considered as a match. If no atoms can be matched

within MatchThres, we assign the score to 0. If there is

an atom within MatchThres, we assign the score to 1.

The process is repeated with each reference frame of

the protein B until all the reference frames of these two

proteins have been tested.

In the case of aligning two different kinds of

molecules, the algorithm is slightly modified while

creating the bases. For each atom whose coordinate is

P, select two atoms connected with the atom, assuming

that the coordinates for these two atoms are Q

and Q

respectively. The rule for constructing basis is

1. Normalize

JJJJK

ePQ

K JJJJK

321

eee

JKJJKJK

and is illustrated in Figure 1(b). The origin of the

new coordinate frame is P. If an atom is connected

with

n atoms, there would be )1( −× nn coordinate

frames made for this atom. In this way, the number of

constructed coordinate frames is too large so that the

execution is not efficient. In order to decrease the

execution time, the criteria for selecting atoms to

create bases is listed in Table 1. Then we calculate two

bases for each residue, while we calculate four bases

for each nucleotide. In proteins, the

“ CA ” atom is

on the backbone and attached with a side-chain, and

the “ CB ” atom is the attached atom. In nucleic acids,

the “ C4*” atom and the “ C3*” atom are both on the

similar position as the “ CA ” atom in proteins. And

“ O4*” atom and “ C2*” are on the similar position as

the “ CB ” atom in proteins. This is illustrated in

Figure 2.

Table 1. The rule for selecting atoms to

construct coordinate frames.

Type of the

molecule

Name of the

atom lie in P

Name of the

atom lie in Q

Proteins “ CA ” “ CB ”

“ C4*” “ O4*” Nucleic Acids

“ C3*” “ C2*”

(a) (b)

Figure 1. Calculation of a basis. (a) The protein structure. (b) The general molecule structure.

(a) (b)

Figure 2. A sketch of molecules to explain the rule for coordinate frame construction. (a) Amino

acid. (b)Nucleotide.

3.2. Fine Tuning Process: Step Two

Once the previous process is done by geometric

hashing for global optimization with an output of

approximate alignment, the following process is a fine

tuning process based on local optimization of

overlapped parts. This step is necessary, since the 3D

structural data in PDB always involve sampling error

in X-ray crystallography in determining atom positions.

Furthermore, geometric hashing just provides initial

alignment. Therefore the alignment needs fine tuning,

and so Iterative Closest Point (ICP) algorithm [23] [24]

is chosen. As illustrated in Figure 3, ICP algorithm is

used in this process repeatedly, until the number of

overlapped atoms within a given distance threshold

can be increased no more.

The ICP algorithm proposes a solution to a key

registration problem below: given two three-

dimensional shapes, estimate the optimal translation

and rotation that register the two shapes by minimizing

the mean square distance between them. The algorithm

guarantees that a local minimum of a mean square

objective function is found [23]. In our implementation,

we select 100 rigid transformations that lead to

maximum numbers of overlapped pairs. The results

show that ICP indeed increases the number of atoms

matched.

Figure 3. The flow chart for fine tuning

process.

4. Experimental Results

4.1. The Molecular Alignment Problem

Our tool can be used in solving the comparison of

two molecules that belong to different types. The data

and the problem of molecular mimicry (Figure 4,

Figure 5) are provided by a graduate student Mr. Han

Liang from Professor Laura Landweber’s group in

Dept. of Ecology and Evolutionary Biology, Princeton

University [25].

One data set [8] consists of EFG (Elongation

Factor-G) and EF-tu (the complex of Elongation

Factor-Tu and tRNA), and the orientations of the

original data are almost the same. The other data set [9]

consists of RRF (Ribosomal Recycling Factor) and

tRNA, but they are not in the same orientation

originally. The aligning results of these two data sets

are shown in Figure 6 and Figure 7.

After calculation by our tool, the rotation matrix

between EFG and EF-tu/tRNA is

⎟

⎠

⎞

⎜

⎝

⎛

−

−−

984929.015437.00780061.0

160734.0983483.00832201.0

0638711.00945041.0993473.0

and the translation vector is

(

)

83966.6935684.009762.2

−

For the case of RRF and tRNA, the rotation matrix

⎟

⎠

⎞

⎜

⎝

⎛

−

−−

626458.0293322.0722159.0

776578.0314407.0545962.0

0669089.0902835.042475.0

Figure 4. EFG vs. EF-tu/tRNA complex

(Nissen et. al 1995 shows that the binding to

ribosome is at the same place and orient-

tation.) This picture is from Professor Laura

Landweber’s group of Ecology and Evolu-

tionary Biology Dept. Princeton University,

and the orientation is manually selected.

Figure 5. RRF vs tRNA (Selmer et. al 1999

shows that the binding to ribosome is at

different place and orientation.), and again

the orientation is manually selected.

and the translation vector is

()

2718.331501.656722.36−

4.2. Comparison with Other Alignment Tools

In order to compare with other tools, we will use

the same set of proteins as in the paper of

Blankenbecler et al. [19]. Note that other protein

alignment methods usually use the knowledge of

matched 1D sequence alignment for proteins, and they

are optimized for proteins only focusing on backbone

atoms C

matching. Our tool does not have this

assumption, and will work for arbitrary molecules,

including tRNA. Still, for comparison purpose, we use

the same set of six proteins. Figure 8 shows that our

tool is better compared to other methods, where Figure

8(a) is reported from Blankenbecler’s [19], in which

Yale [26], Dali [27] [28], CE [17] and Lund [19]

methods are compared, while Figure 8(b) is from our

tool as compared to data in Figure 8(a).

The reasons why our method is better are

Given a fixed RMSD for pairs of matched

atoms, our method has the most number of

backbone C

atoms;

Given fixed number of matched C

, our method

has the lowest RMSD.

In terms of computation cost, the major cost is in

the first step, the geometric hashing. In the case of

proteins, the coordinate frames are generated from the

amino acid C

atoms only, and thus the computation

cost is low. For the six pairs of target proteins, all

alignment calculation is done ranging from 6 seconds

to 47 seconds. Table 2 shows the computation time on

a Pentium-4 3GHz PC.

In the case of molecules such as RNA and DNA,

the nucleic acid has a carbon ring in its base, and

therefore the number of possible coordinate frames

tends to be much more than that of proteins. Certainly,

the computation time is longer. In the case of RRF vs.

tRNA, where there are over 1000 atoms in tRNA, the

computation time is around 24 minutes, while in the

case of EFG vs. EF-tu/tRNA complex (over 4000

atoms), the computation time can be as long as 36

hours on the same 3 GHz PC. Even so, our tool can

still solve this problem, which is a very important

problem called "molecular mimicry". As far as we

know, our method is the first one to solve this kind of

problems, because our algorithm is sequence

independent, and does not use the knowledge of 1D

sequence similarity in molecule pairs.

5. Conclusion

A novel tool is developed to align two molecules

based on 3D structural data. In contrast to other

algorithms, it takes more computation time to align

two molecules by our tool. However, other tools might

be restricted to align two proteins. The experiments are

conducted based on the data from the PDB and

demonstrate that the proposed tool is useful and

versatile.

The first experiment is the molecular alignment

problem. Given two molecules, our tool will generate

the rotation matrix and translation vector so that the

above two molecules are optimally aligned. In our

experiments, the results are the same, no matter where

we randomly place the molecules in a different

location with different orientation.

Figure 6: Alignment of two molecules using our tool for EFG vs. EF-tu/tRNA complex, where the

atom number is over 4000 and the computation time is about 36 hours on a Pentium-4 3GHz PC.

Figure 7: Alignment of two molecules using our tool for RRF vs. tRNA, where the atom number

is over 1000 and the computation time is about 24 minutes on a Pentium-4 3GHz PC.

(a) (b)

Figure 8: Alignment results for a set of protein pairs in terms of RMSD of matched atom pairs

and number of aligned atoms (N). In this figure, (a) is from Blankenbecler et al. fuzzy alignment

method. The results from Yale (red squares), Dali (green triangles), CE (blue circles), and Lund

method (solid lines) are also given in their paper. (b) is from our tool as a comparison. It shows

that our results are better as compared with other methods.

Table 2: Computation time of alignment of six pairs of proteins, where MatchThres means the

threshold used in initial geometric hashing, while the other columns are in seconds.

MatchThres (Å) 8DFR-4DFRa 1MBD-1MBA 1TIE-4FGF 1CID-2RHE 7FABl2-1REIa 1FXIa-1UBQ

1.0 7 5 4 3 2 1

1.5 9 6 5 5 2 1

2.0 11 7 6 5 3 1

2.5 13 10 8 7 3 2

3.0 18 12 9 9 4 3

3.5 22 16 13 11 5 3

4.0 30 20 15 13 7 3

4.5 37 26 20 18 8 6

5.0 47 34 24 22 10 6

In the second experiment, several protein pairs are

used to compare the results with four popular

alignment tools, namely Yale [26], Dali [27] [28], CE

[17] and Lund [19] methods. Our tool performs the

best in terms of RMSD and number of matched atom

pairs.

6. References

[1] D.Y. Chen, X.P. Tian, Y.T. Shen, and M. Ouhyoung, “On

visual similarity based 3D model retrieval” Comput. Graph.

Forum, 22(3), 2003, pp. 223-232.

[2] T. Funkhouser, P. Min, M. Kazhdan, J. Chen , A.

Halderman, D. Dobkin, and D. Jacobs, “A search engine for

3d models” ACM T. Graphics, 22(1), Jan. 2003, pp. 83-105.

[3] L. Holm and C. Sander, “Touring protein fold space with

Dali/FSSP” Nucl. Acids Res., 26, 1998, pp. 316-319.

[4] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B.

Swindells, and J.M. Thornton, “CATH - a hierarchic

classification of protein domain structures”, Structure, 5(8),

Aug. 1997, pp. 1093-1108.

[5] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia,

“SCOP: a structural classification of proteins for the

investigation of sequences and structures”, J. Mol. Biol., 247,

1995, pp. 536-540.

[6] L. Holm and C. Sander, “Mapping the protein universe”,

Science, 273, Aug. 1996, pp. 595-602.

[7] P. Nissen, M. Kjeldgaard, and J. Nyborg,

“Macromolecular mimicry”, EMBO J., 19, 2000, pp. 489-495.

[8] P. Nissen, M. Kjeldgaard, S. Thirup, G. Polekhina, L.

Reshetnikova, B.F.C. Clark, and J. Nyborg, “Crystal

structure of the ternary complex of Phe-tRNA

Phe

, EF-Tu, and

a GTP analog”, Science, 270, Dec. 1995, pp. 1464-1472.

[9] M. Selmer, S. Al-Karadaghi, G. Hirokawa, A. Kaji, and A.

Liljas, “Crystal structure of Thermotoga maritima ribosome

recycling factor: a tRNA mimic”, Science, 286, Dec. 1999,

pp. 2349-2352.

[10] V. Cappello, A. Tramontano, and U. Koch,

“Classification of proteins based on the properties of the

ligand-binding site: the case of adenine-binding proteins”,

Proteins, 47(2), May 2002, pp. 106-115.

[11] G.R. Smith and M.J. Sternberg, “Prediction of protein-

protein interactions by docking methods”, Curr. Opin. Struct.

Biol., 12(1), Feb. 2002, pp. 28-35.

[12] R.H. Lathrop, “The protein threading problem with

sequence amino acid interaction preferences is NP-complete”,

Protein Eng., 7, 1994, pp. 1059- 1068.

[13] D. Fischer, O. Bachar, R. Nussinov, and H. Wolfson,

“An efficient automated computer vision based technique for

detection of three dimensional structural motifs in proteins”,

J. Biomol. Struct. Dyn., 9(4), Feb. 1992, pp. 769-789.

[14] C.J. Tsai, S.L. Lin, H. Wolfson, and R. Nussinov,

“Techniques for searching for structural similarities between

protein cores, protein surfaces and between protein-protein

interfaces”, Techniques in Protein Chemistry, VII, 1996, pp.

419-429.

[15] X. Pennec and N. Ayache, “An O(n

) algorithm for 3D

substructure matching of proteins”, Shape and Pattern

Matching in Computational Biology - Proc. First Int.

Workshop, 1994, pp. 25-40.

[16] X. Pennec and N. Ayache, “A geometric algorithm to

find small but highly similar 3D substructures in proteins”,

Bioinformatics, 14(6), 1998, pp. 516-522.

[17] I.N. Shindyalov and P.E. Bourne, “Protein structure

alignment by incremental combinatorial extension (CE) of

the optimal path”, Protein Eng., 11(9), Sep. 1998, pp. 739-

747.

[18] A. Zemla, “LGA: A method for finding 3D similarities

in protein structures”, Nucleic Acids Res., 31(13), Jul. 2003,

pp. 3370-3374.

[19] R. Blankenbecler, M. Ohlsson, C. Peterson, and M.

Ringner, “Matching protein structures with fuzzy

alignments”, Proc. Natl. Acad. Sci. USA., 100(21), Oct. 2003,

pp. 11936-11940.

[20] M. Milik, S. Szalma, and K.A. Olszewski1, “Common

structural cliques: a tool for protein structure and function

analysis”, Protein Eng., 16(8), Aug. 2003, pp. 543-552.

[21] Y. Lamdan and H.J. Wolfson, “Geometric hashing: a

general and efficient model-based recognition scheme”,

Proceedings of the Second ICCV, 1988, pp. 238-249.

[22] H.J. Wolfson and I. Rigoutsos, “Geometric hashing: an

overview”, IEEE comp. Science and Eng., 4, 1997, pp. 10-21.

[23] P.J. Besl and N.D. McKay, “A method for registration of

3-D shapes”, IEEE T. Pattern ANAL., 14, 1992, pp. 239-256.

[24] Z. Zhang, “Iterative point matching for registration of

free-form curves and surfaces”, Int. J. Comput. Vision, 13(2),

1994, pp. 119-152.

[25] H. Liang and L.F. Landweber, “Computational tests of

molecular mimicry between tRNA and protein translation

factors”, submitted, 2004.

[26] M. Gerstein and M. Levitt, “Using iterative dynamic

programming to obtain accurate pairwise and multiple

alignments of protein structures”, Proc. Int. Conf. Intell. Syst.

Mol. Biol., 4, 1996, pp. 59-67.

[27] L. Holm and C. Sander, “Protein structure comparison

by alignment of distance matrices”, J. Mol. Biol., 233, 1993,

pp. 123-138.

[28] L. Holm and J. Park, “DaliLite workbench for protein

structure comparison”, Bioinformatics, 16, 2000, pp. 566-567.

Iterative Protein Alignment Algorithm (IPA)

Article

May 2012

A new algorithm is proposed for protein alignment based on their 3D structures. Initial protein matching uses comparison of their secondary structure elements. Each element of secondary structure is described by a vector representing its principal axis of inertia. All possible pairs of vectors from both proteins are chosen to compute similarity function values. A number of matched vector pairs with the best values of similarity function are considered as initial matches. Then, for all initial matches Fractional Iterative Closest Point algorithm is applied to improve correspondence between proteins.

Heuristic Strategy for Geometric Hashing Based Protein Structure Comparison of Ellipsoidal Representation

Conference Paper

Full-text available

Nov 2007

Many protein structure comparison methods use secondary structure information to do fast structure similarity search for initial alignment finding and refine the results from possible optimal candidate solutions by iteratively dynamic programming to optimize the final results. In this paper, we develop a method, Ellipsoidal Model Protein Structure Comparison, based on the concept of secondary structure elements alignment followed by iteratively refinement. In order to utilize all possible structure information to obtain alternative solutions for further analysis, we use ellipsoidal model to represent not only mainly -helices and -sheets, but the remaining fragments for structural alignment. Different heuristic filters and geometric hashing based global alignment estimation are applied for quick finding better initial alignments. We also provide top-N solutions without increasing extra computational time rather than only best solution in the previous works. Now, we provide the online web service, Ballerina (http://ballerina.csie.ntu.edu.tw/), for protein structure comparison.

Molecular mimicry: Quantitative methods to study structural similarity between protein and RNA

Article

Full-text available

Sep 2005

With rapidly increasing availability of three-dimensional structures, one major challenge for the post-genome era is to infer the functions of biological molecules based on their structural similarity. While quantitative studies of structural similarity between the same type of biological molecules (e.g., protein vs. protein) have been carried out intensively, the comparable study of structural similarity between different types of biological molecules (e.g., protein vs. RNA) remains unexplored. Here we have developed a new bioinformatics approach to quantitatively study the structural similarity between two different types of biopolymers--proteins and RNA--based on the spatial distribution of conserved elements. We applied it to two previously proposed tRNA-protein mimicry pairs whose functional relatedness between two molecules has been recently determined experimentally. Our method detected the biologically meaningful signals, which are consistent with experimental evidence.

Heuristic Strategy for Geometric Hashing Based Protein Structure Comparison of Ellipsoidal Representation

Conference Paper

Dec 2007

Comparison of Non-Sequential Sets of Protein Residues

Article

Full-text available

Dec 2015

A methodology for performing sequence-free comparison of functional sites in protein structures is introduced. The method is based on a new notion of similarity among superimposed groups of amino acid residues that evaluates both geometry and physico-chemical properties. The method is specifically designed to handle disconnected and sparsely distributed sets of residues. A genetic algorithm is employed to find the superimposition of protein segments that maximizes their similarity. The method was evaluated by performing an all-to-all comparison on two separate sets of ligand-binding sites, comprising 47 protein-FAD (Flavin-Adenine Dinucleotide) and 64 protein-NAD (Nicotinamide-Adenine Dinucleotide) complexes, and comparing the results with those of an existing sequence-based structural alignment tool (TM-Align). The quality of the two methodologies is judged by the methods’ capacity to, among other, correctly predict the similarities in the protein-ligand contact patterns of each pair of binding sites. The results show that using a sequence-free method significantly improves over the sequence-based one, resulting in 23 significant binding-site homologies being detected by the new method but ignored by the sequence-based one.

EMPSC: A New Method Based on Ellipsoidal Model for Protein Structure Comparison

Article

Full-text available

Jan 2006

This paper proposes a new method EMPSC for the well-known PSC (Protein Structure Comparison) problem. The proposed method EMPSC is a protein structural alignment algorithm based on ellipsoidal model abstraction. We segment the protein 3D structure into two different kinds of structures, including Secondary Structure Elements recognized by DSSP 1 and other coil/loop structures. These SSEs will be the initial alignment center for obtaining the transformation coordinate systems. Different heuristic filters and geometric hashing based global alignment estimation are used for quick finding better initial alignments. In the refined alignment stage of analysis, a standard refinement algorithm is invoked to fine-tune the alignment outputted by the first stage. Our experimental results reveal that EMPSC generally achieves comparable accuracy and better performance in comparison with the existing PSC algorithms. Moreover, we analyzed the factors that affect the EMPSC performance and SSE-based PSC algorithms. Further investigation in multiple protein structure comparison and local structure comparison will be continued.

Common Substructure Extraction of Proteins by Geometric Invariants

Conference Paper

Nov 2007

Structure alignment could help to find shape similarities between proteins and guide structure classification and fold recognition. Common substructure detection and extraction are especially important, for which could guide the biologist to discover binding site or active site. We represent each segment of alpha-carbon backbone by using dihedral angles and curve moment invariants. Then, local and global structure alignment could be performed by iterative closest point algorithm. Maximum common substructures between a pair of proteins or within a protein could be found. Active sites also could be detected by the proposed algorithm.

A web-based protein retrieval system by matching visual similarity

Conference Paper

Sep 2005

A web-based three-dimensional (3D) protein retrieval system is available for protein structure data including all PDB and FSSP dataset. In this system, we use a visual-based matching method to compare the protein structure from multiple viewpoints. It takes less than three seconds for each query with 90% accuracy on the average. Availability: The web-based query interface and downloadable files can be accessed via http://3d.csie.ntu.edu.tw/ ProteinRetrieval/ Supplementary information: Further details of the proposed method are available at http://graphics.csie.ntu.edu.tw/~jsyeh/3Dprotein/

Article

Full-text available

Sep 2003

A large number of 3D models are created and available on the Web, since more and more 3D modelling anddigitizing tools are developed for ever increasing applications. The techniques for content-based 3D model retrievalthen become necessary. In this paper, a visual similarity-based 3D model retrieval system is proposed.This approach measures the similarity among 3D models by visual similarity, and the main idea is that if two 3Dmodels are similar, they also look similar from all viewing angles. Therefore, one hundred orthogonal projectionsof an object, excluding symmetry, are encoded both by Zernike moments and Fourier descriptors as features forlater retrieval. The visual similarity-based approach is robust against similarity transformation, noise, model degeneracyetc., and provides 42%, 94% and 25% better performance (precision-recall evaluation diagram) thanthree other competing approaches: (1) the spherical harmonics approach developed by Funkhouser et al., (2) theMPEG-7 Shape 3D descriptors, and (3) the MPEG-7 Multiple View Descriptor. The proposed system is on the Webfor practical trial use (http://3d.csie.ntu.edu.tw), and the database contains more than 10,000 publicly available3D models collected from WWW pages. Furthermore, a user friendly interface is provided to retrieve 3D modelsby drawing 2D shapes. The retrieval is fast enough on a server with Pentium IV 2.4 GHz CPU, and it takes about2 seconds and 0.1 seconds for querying directly by a 3D model and by hand drawn 2D shapes, respectively. Categories and Subject Descriptors (according to ACM CCS): H.3.1 [Information Storage and Retrieval]: Indexing Methods

Iterative point matching for registration of free-from curves and surfaces

Article

Jan 1994
INT J COMPUT VISION

Zhengyou Zhang

An ¸al O(n<sup>2</sup>) Algorithm for 3D Substructure Matching of Proteins

Article

Jan 1994

Published in Bioinformatics 14(6), 1998, p. 516-522

Matching protein structures with fuzzy alignments

Article

Oct 2003

Richard Blankenbecler

Iterative point matching of free-form curves and surfaces

Article

Oct 1994

Zhengyou Zhang

A heuristic method has been developed for registering two sets of 3-D curves obtained by using an edge-based stereo system, or two dense 3-D maps obtained by using a correlation-based stereo system. Geometric matching in general is a difficult unsolved problem in computer vision. Fortunately, in many practical applications, some a priori knowledge exists which considerably simplifies the problem. In visual navigation, for example, the motion between successive positions is usually approximately known. From this initial estimate, our algorithm computes observer motion with very good precision, which is required for environment modeling (e.g., building a Digital Elevation Map). Objects are represented by a set of 3-D points, which are considered as the samples of a surface. No constraint is imposed on the form of the objects. The proposed algorithm is based on iteratively matching points in one set to the closest points in the other. A statistical method based on the distance distribution is used to deal with outliers, occlusion, appearance and disappearance, which allows us to do subset-subset matching. A least-squares technique is used to estimate 3-D motion from the point correspondences, which reduces the average distance between points in the two sets. Both synthetic and real data have been used to test the algorithm, and the results show that it is efficient and robust, and yields an accurate motion estimate.

Protein structure comparison by

Article

Techniques for searching for structural similarities between protein cores, protein surfaces and between protein-protein interfaces

Article

Dec 1996

This chapter discusses the techniques for searching for structural similarities among protein cores, protein surfaces and among protein–protein interfaces. The chapter shows that for comparisons of protein structures, implementing considerations of connectivity into the matching procedure, further improves the geometric hashing technique. A number of protein structure comparison techniques have been developed that fall into three categories: (1) techniques based on dynamic programming, (2) technique that matches the 3D structures belonging to fragments of contiguous amino acids, (3) technique derived from computer vision. The geometric hashing is a highly efficient tool for structural comparisons of proteins and for docking. As it is based in computer vision, it matches unconnected points in space. This enables matching protein structures in a manner that is entirely independent of their amino acid sequence order, and carrying out docking of a ligand onto a receptor surface. It avoids the time consuming search of entire conformational space by matching the points in a transformation invariant manner. Consequently, high quality matches are obtained in short times.

SCOP: A structural classification of proteins database for the investigation of sequences and structures

Article

Apr 1995

To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry Links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http://scop.mrc-lmb.cam.ac.uk/scop/ scop: an old English poet or minstrel (Oxford English Dictionary); ckon: pile, accumulation (Russian Dictionary).

Geometric Hashing: A General And Efficient Model-based Recognition Scheme

Conference Paper

Jan 1988

Not Available

A Search Engine for 3D Models

Article

Jan 2003

As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people find them. Unfortunately, traditional text-based search techniques are not always effective for 3D data. In this paper, we investigate new shape-based search methods. The key challenges are to develop query methods simple enough for novice users and matching algorithms robust enough to work for arbitrary polygonal models. We present a web-based search engine system that supports queries based on 3D sketches, 2D sketches, 3D models, and/or text keywords. For the shape-based queries, we have developed a new matching algorithm that uses spherical harmonics to compute discriminating similarity measures without requiring repair of model degeneracies or alignment of orientations. It provides 46--245% better performance than related shape matching methods during precision-recall experiments, and it is fast enough to return query results from a repository of 20,000 models in under a second. The net result is a growing interactive index of 3D models available on the Web (i.e., a Google for 3D models).

A tool for structure alignment of molecules

Abstract and Figures

Recommended publications

Randomized trial of Yoga as a complementary therapy for pulmonary tuberculosis

Comparación de la evolución clínica de la depresión en dos tipos de abordaje terapeútico grupal para...

Diabetes mellitus as a risk factor for pancreatic cancer. A meta-analysis

Clinical efficacy of accelerated partial-breast irradiation in treatment of ER-negative breast cance...