ArticlePDF Available

The Jalview Java Alignment Editor

March 2004
Bioinformatics 20(3):426-7

March 2004
20(3):426-7

DOI:10.1093/bioinformatics/btg430

Source
PubMed

Authors:

Michele Clamp

Harvard University

Stephen M J Searle

Geoffrey J Barton

University of Dundee

Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments. Availability: The Jar file and source code for Jalview is freely available via the World Wide Web at http://www.jalview.org. A Jalview mailing list is also available by e-mailing majordomo{at}sanger.ac.uk with subscribe Jalview in the body of the mail.

Content uploaded by Geoffrey J Barton

Content may be subject to copyright.

BIOINFORMATICS APPLICATIONS NOTE

Vol. 20 no. 3 2004, pages 426–427

DOI: 10.1093/bioinformatics/btg430

The Jalview Java alignment editor

Michele Clamp

1,2,4,∗

, James Cuff

1,2

, Stephen M. Searle

1,2

and Geoffrey J. Barton

2,3,4

The Wellcome Trust Sanger Institute and

The European Bioinformatics Institute,

Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK,

School of Life Sciences,

University of Dundee, Dow St, Dundee, DD1 5EH, UK and

The Wellcome Trust Centre

for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK

Received on April 16, 2003; revised on August 4, 2003; accepted on August 6, 2003

Advance Access Publication January 22, 2004

ABSTRACT

Summary: Multiple sequence alignment remains a crucial

method for understanding the function of groups of related

nucleic acid and protein sequences. However, it is known

that automatic multiple sequence alignments can often be

improved by manual editing. Therefore, tools are needed to

view and edit multiple sequence alignments. Due to growth in

the sequence databases, multiple sequence alignments can

often be large and difﬁcult to view efﬁciently.The Jalview Java

alignment editoris presented here, whichenablesfastviewing

and editing of large multiple sequence alignments.

Availability:TheJarﬁleandsourcecodeforJalviewisfreely

available via the World Wide Web at http://www.jalview.

org. A Jalview mailing list is also available by e-mailing

majordomo@sanger.ac.uk with subscribe Jalview in the body

of the mail.

Contact: michele@sanger.ac.uk

INTRODUCTION

The alignment of biological sequences has a long history and

the development of automatic techniques has eased the dif-

ﬁculty of generating alignments from unaligned sequences.

However, even the best multiple sequence alignment methods

only achieve <50% accuracy per position in the alignment

of sequences with <20% identity (Thompson et al., 1999).

Biologists can often use other information about the sequence

and structure of a family of proteins to improve a multiple

sequence alignment. Therefore, biologists striving for the

best possible alignment will often need to edit manually an

automatically generated alignment.

There exist a large number of software packages that allow

the viewing of multiple sequence alignments. These include

Belvu, Alscript (Barton, 1993), ClustalX (Thompson et al.,

1997) and Chroma (Goodstadt and Ponting, 2001). These

packages do not allow editing of multiple sequence align-

ments.Although alignments can be edited in word processing

∗

To whom correspondence should be addressed.

software, such as Microsoft Word or emacs, it is often

difﬁculttosee conservedpatterns without aspeciﬁccolouring

of the alignment that these programs do not provide. In

addition, specialized multiple sequence alignment editors

can provide extra features for the user including group-

ing and analysis of the conservation patterns in the align-

ment. A small number of software packages exist that allow

editing of multiple sequence alignments, such as Gene-

Doc (Nicholas and Nicholas, 1997, http://www.cris.com/

~Ketchup/genedoc.shtml), BioEdit, Seaview (Galtier et al.,

1996), MPSA(Blanchetetal.,2000),ANTHEPROT(Deleage

et al., 2001) and CINEMA (Parry-Smith et al.,1998) amongst

others. Of these, CINEMA has most similarities with Jalview

as it is written in Java. However, Jalview provides extra func-

tionalitywiththeabilityto calculatetrees, conservationwithin

subfamilies and on the ﬂy pairwise alignments.

The Jalview program was written with the following design

goals in mind. First, it should be platform independent;

second, it should be fast and capable of editing of large mul-

tiple sequence alignments without signiﬁcant degradation of

performance; and third, it should allow multiple integrated

views of the alignment and other data. These goals were

addressed by coding the software in the platform independent

Java version 1.1 language.

FEATURES OF JALVIEW

Jalview has a rich functionality based on its core alignment

viewing and editing options. These features are described in

outline in the following section. Jalview can input and output

multiple sequence alignmentsin a variety ofcommon formats

includingMSF,aligned FastaandClustalformat. Onceloaded

into Jalview the alignments are coloured by default according

to the ClustalX colouring scheme (Thompson et al., 1997). A

number of other colouring options are available via the edit

menu including a user conﬁgurable scheme. If the user does

not have a sequence alignment, a set of unaligned sequences

can be aligned using ClustalW either locally or via the web at

the EBI ClustalW server (Brooksbank et al., 2003).

at University of Portland on May 24, 2011bioinformatics.oxfordjournals.orgDownloaded from

Jalview Java alignment editor

Editing multiple sequence alignments in Jalview simply

requiresthe userto dragresidues totheleft toremovegapsand

to the right to insert gaps at the cursor position. Editing can

be carried out on multiple sequences by applyinggroup selec-

tion, foundintheeditmenu. Groupingsequencescanspeedup

editing of large numbers of similar sequences. Jalview allows

users to calculate UPGMA or neighbour-joining trees (Saitou

and Nei, 1987). Upon selecting this option, a new window

is opened to display the tree. These trees can be used to re-

orderthe sequences in a multiplealignmentas well as to select

groups of sequences for group editing.

Sequence features on a multiple sequence alignment can

be viewed in Jalview. If the sequence identiﬁers in the align-

ment are Swiss-Prot/TrEMBL identiﬁers Jalview can access

the EBI website via SRS to download feature table elements

and display them on the alignment (Brooksbank et al., 2003).

Byright-clickingonSwiss-Prot/TrEMBLsequenceidentiﬁers

in the alignment window, the entry is retrieved from an SRS

server and displayed in Jalview’s lightweight web-browser. If

a structure is knownfor one of the sequences inthe alignment,

thiscan alsobe downloadedfroman SRSserverand displayed

in the Jalview structure viewer. The colour scheme from the

alignment is projected on to the structure to highlight regions

of conservation.

Principal component analysis (PCA) can help in under-

standing the relationship between sequences of an alignment.

ThemethodofclusteringsequencesimplementedinJalviewis

based on the method applied in SequenceSpace (Casari et al.,

1995). When PCA is selected from the calculate menu a PCA

viewer window is created that shows the sequences projec-

ted on to the ﬁrst three eigenvectors. Clicking on points in

the PCA window selects the corresponding sequence in the

alignment window and in the tree window if it is visible.

Multiple sequence alignments often contain sub-families of

sequences and applying a colour scheme across the whole

alignment can make it difﬁcult to identify these families.

Jalview allows the user to deﬁne sequence groups easily by

using the tree panel. Clicking on the tree deﬁnes a max-

imum distance apart any two sequences can be in a group

and the alignment is split into groups accordingly. Conserva-

tion across each group can then be calculated by considering

the different amino acid properties across each column in the

group(Zvelebiletal.,1987). Columnsthataremostconserved

have the most intense colour schemes fading to no colouring

at all for unconserved columns.

The Jalview software was originally written in 1997 and

is now widely used with over 20 000 downloads. It has

been used to produce publication quality alignment ﬁgures

as well as to provide a platform independent method to view

multiple sequence alignments by databases, such as Pfam

(Bateman et al., 2002). Jalview is run as an applet via the

Pfam web pages, for an example see http://www.sanger.

ac.uk/Software/Pfam/cgi-bin/getacc.pl?PF00045. Jalview is

also used by the EBI ClustalW server (Brooksbank et al.,

2003) as well as in the Apollo genome annotation editor

(Lewisetal., 2002). The supplementary information available

at http://www.jalview.org/bioinf/supp.html includes a ﬁgure

showing a screenshot of the main Jalview windows.

REFERENCES

Barton,G.J. (1993) ALSCRIPT: a tool to format multiple sequence

alignments. Protein Eng., 6, 37–40.

Bateman,A., Birney,E. et al. (2002) The Pfam protein families

database. Nucleic Acids Res., 30, 276–280.

Blanchet,M. et al. (2000) MPSA: integrated system for mul-

tiple protein sequence analysis with client/server capabilities.

Bioinformatics, 16, 286–287.

Brooksbank,C., Camon,E. et al. (2003) The European

Bioinformatics Institute’s data resources. Nucleic Acids

Res., 31, 43–50.

Casari,G., Sander,C. et al. (1995) A method to predict functional

residues in proteins. Nat. Struct. Biol., 2, 171–178.

Galtier,N., Gouy,M. et al. (1996) SEAVIEW and PHYLO_WIN: two

graphic tools for sequence alignment and molecular phylogeny.

Comput. Appl. Biosci., 12, 543–538.

Goodstadt,L. and Ponting,C.P. (2001) CHROMA: consensus-based

colouring of multiple alignments for publication. Bioinformatics,

17, 845–846.

Deleage,G. et al. (2001) ANTHEPROT: an integrated pro-

tein sequence analysis software with client/server capabilities.

Comput. Biol. Med., 31, 259–267.

Lewis,S.E., Searle,S.M. et al. (2002) Apollo: a sequence annotation

editor. Genome Biol., 3, RESEARCH0082.

Nicholas,K.B. and Nicholas,H.B.Jr (1997) GeneDoc: Analysis and

Visualization of Genetic Variation.

Parry-Smith,D.J., Payne,A.W. et al. (1998). CINEMA—a novel

colour interactive editor for multiple alignments. Gene, 221,

GC57–GC63.

Saitou,N. and Nei,M. (1987) The neighbor-joining method: a new

method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4,

406–425.

Thompson,J.D., Gibson,T.J. et al. (1997) The CLUSTAL_X

windows interface: ﬂexible strategies for multiple sequence

alignment aided by quality analysis tools. Nucleic Acids Res.,

25, 4876–4882.

Thompson,J.D., Plewniak,F. et al. (1999) A comprehensive

comparison of multiple sequence alignment programs. Nucleic

Acids Res., 27, 2682–2690.

Zvelebil et al. (1987) Prediction of protein secondary structure and

active sites using the alignment of homologous sequences. J.Mol.

Biol., 195, 957–961.

427

at University of Portland on May 24, 2011bioinformatics.oxfordjournals.orgDownloaded from

Genome-Wide Identification of the WUSCHEL-Related Homeobox (WOX) Gene Family in Three Cymbidium Species and Expression Patterns in C. goeringii

Article

Full-text available

Jun 2024

Numerous members of the WOX gene family play pivotal roles during the processes of growth and development in many plants, as has been demonstrated. Cymbidium goeringii, Cymbidium ensifolium, and Cymbidium sinense are ornamental plants with a fascinating floral morphology that are economically important in China. However, there is limited knowledge about the members of the WOX gene family and their functions in these three Cymbidium species. Hence, the WOX genes in three Cymbidium species were identified on the ground of the genomes data of C. goeringii, C. ensifolium, and C. sinense in this study. These identified WOX genes were further studied for their physicochemical properties, evolutionary relationship, gene structure, protein structure, and cis-acting elements of promoters, as well as the expression pattern of the WOX genes in different tissues of C. goeringii. The findings revealed that eight WOX genes in C. goeringii, twelve WOX genes in C. ensifolium, and nine WOX genes were identified. These WOX genes were further subdivided into WUS, ancient, and intermediate clades. The length of the coding region ranged from 149 to 335 aa, and it was predicted that all WOX genes would be located on the cell nucleus. The promoter cis-acting elements primarily comprised stress response, phytohormone response, plant growth and development, and transcription factor elements. Furthermore, both the transcriptomic data and RT-qPCR analysis showed that most WOX genes may be involved in multiple developmental stages of C. goeringii. To sum up, these results may serve as a theoretical foundation for further study of the function analysis of WOX genes in orchids.

A novel prolixicin identified in common bed bugs with activity against both bacteria and parasites

Article

Full-text available

Jun 2024

The hematophagous common bed bug, Cimex lectularius, is not known to transmit human pathogens outside laboratory settings, having evolved various immune defense mechanisms including the expression of antimicrobial peptides (AMPs). We unveil three novel prolixicin AMPs in bed bugs, exhibiting strong homology to the prolixicin of kissing bugs, Rhodnius prolixus, and to diptericin/attacin AMPs. We demonstrate for the first time sex-specific and immune mode-specific upregulation of these prolixicins in immune organs, the midgut and rest of body, following injection and ingestion of Gr+ (Bacillus subtilis) and Gr– (Escherichia coli) bacteria. Synthetic CL-prolixicin2 significantly inhibited growth of E. coli strains and killed or impeded Trypanosoma cruzi, the Chagas disease agent. Our findings suggest that prolixicins are regulated by both IMD and Toll immune pathways, supporting cross-talk and blurred functional differentiation between major immune pathways. The efficacy of CL-prolixicin2 against T. cruzi underscores the potential of AMPs in Chagas disease management.

The RXLR effector PpE18 of Phytophthora parasitica is a virulence factor and suppresses peroxisome membrane‐associated ascorbate peroxidase NbAPX3‐1‐mediated plant immunity

Article

Full-text available

Jun 2024
NEW PHYTOL

Phytophthora parasitica causes diseases on a broad range of host plants. It secretes numerous effectors to suppress plant immunity. However, only a few virulence effectors in P. parasitica have been characterized. Here, we highlight that PpE18, a conserved RXLR effector in P. parasitica, was a virulence factor and suppresses Nicotiana benthamiana immunity. Utilizing luciferase complementation, co‐immunoprecipitation, and GST pull‐down assays, we determined that PpE18 targeted NbAPX3‐1, a peroxisome membrane‐associated ascorbate peroxidase with reactive oxygen species (ROS)‐scavenging activity and positively regulates plant immunity in N. benthamiana. We show that the ROS‐scavenging activity of NbAPX3‐1 was critical for its immune function and was hindered by the binding of PpE18. The interaction between PpE18 and NbAPX3‐1 resulted in an elevation of ROS levels in the peroxisome. Moreover, we discovered that the ankyrin repeat‐containing protein NbANKr2 acted as a positive immune regulator, interacting with both NbAPX3‐1 and PpE18. NbANKr2 was required for NbAPX3‐1‐mediated disease resistance. PpE18 competitively interfered with the interaction between NbAPX3‐1 and NbANKr2, thereby weakening plant resistance. Our results reveal an effective counter‐defense mechanism by which P. parasitica employed effector PpE18 to suppress host cellular defense, by suppressing biochemical activity and disturbing immune function of NbAPX3‐1 during infection.

Genome-wide characterisation of HD-Zip transcription factors and functional analysis of PbHB24 during stone cell formation in Chinese white pear (Pyrus bretschneideri)

Article

Full-text available

May 2024
BMC PLANT BIOL

Background The homodomain-leucine zipper (HD-Zip) is a conserved transcription factor family unique to plants that regulate multiple developmental processes including lignificaion. Stone cell content is a key determinant negatively affecting pear fruit quality, which causes a grainy texture of fruit flesh, because of the lignified cell walls. Results In this study, a comprehensive bioinformatics analysis of HD-Zip genes in Chinese white pear (Pyrus bretschneideri) (PbHBs) was performed. Genome-wide identification of the PbHB gene family revealed 67 genes encoding PbHB proteins, which could be divided into four subgroups (I, II, III, and IV). For some members, similar intron/exon structural patterns support close evolutionary relationships within the same subgroup. The functions of each subgroup of the PbHB family were predicted through comparative analysis with the HB genes in Arabidopsis and other plants. Cis-element analysis indicated that PbHB genes might be involved in plant hormone signalling and external environmental responses, such as light, stress, and temperature. Furthermore, RNA-sequencing data and quantitative real-time PCR (RT-qPCR) verification revealed the regulatory roles of PbHB genes in pear stone cell formation. Further, co-expression network analysis revealed that the eight PbHB genes could be classified into different clusters of co-expression with lignin-related genes. Besides, the biological function of PbHB24 in promoting stone cell formation has been demonstrated by overexpression in fruitlets. Conclusions This study provided the comprehensive analysis of PbHBs and highlighted the importance of PbHB24 during stone cell development in pear fruits.

In silico analysis and designing gRNA constructs for the precise modification of the OsTMS5 gene in rice (Oryza sativa L.): a comprehensive study and construct development for crop improvement

Article

Full-text available

Apr 2024

The CRISPR/Cas9 system represents a state-of-the-art technology for precise genome editing in plants. In this study, we performed in silico and evolutionary analyses, as well as designed guide RNA (gRNA) constructs for the precise modification of the thermosensitive genic male sterile (OsTMS5) gene using the CRISPR/Cas9 system in rice (Oryza sativa L.). The OsTMS5 promoter harbours a diverse array of cis-elements, which are linked to light responsiveness, hormonal regulation, and stress-related signaling. Further, expression pattern of OsTMS5 revealed that OsTMS5 exhibited responsiveness to hormones and was activated across diverse tissues and developmental stages in rice. In addition, we meticulously designed gRNA with a length of 20 base pairs. This design process was conducted using the CRISPR-P v2.0 online platform. The target of these gRNAs was the rice OsTMS5 gene. The selection of the top two gRNAs was made after conducting a thorough evaluation, which included assessing factors such as on-score value, minimum off-target score, GC content, potential off-target sites, and genomic location. Furthermore, two types of entry vectors were utilized, and the pMDC99 vector served as the destination vector for plant transformation. Following the annealing and ligation of the gRNAs through LR recombination, the resulting plasmid was named as “pMDC99-eSPCas9 + OsU6-OsTMS5-target1-gRNA + OsU6-OsTMS5-target2-gRNA”. Subsequently, this plasmid obtained from the third LR recombination was introduced into Agrobacterium EHA105 for the purpose of conducting rice transformation. Therefore, these constructs have the potential for use not only in molecular genetic analyses and molecular breeding in rice but also in a wide range of other crop species.

Genome-wide identification and characterization of TCP gene family in Dendrobium nobile and their role in perianth development

Article

Full-text available

Feb 2024

TCP is a widely distributed, essential plant transcription factor that regulates plant growth and development. An in-depth study of TCP genes in Dendrobium nobile, a crucial parent in genetic breeding and an excellent model material to explore perianth development in Dendrobium, has not been conducted. We identified 23 DnTCP genes unevenly distributed across 19 chromosomes and classified them as Class I PCF (12 members), Class II: CIN (10 members), and CYC/TB1 (1 member) based on the conserved domain and phylogenetic analysis. Most DnTCPs in the same subclade had similar gene and motif structures. Segmental duplication was the predominant duplication event for TCP genes, and no tandem duplication was observed. Seven genes in the CIN subclade had potential miR319 and -159 target sites. Cis-acting element analysis showed that most DnTCP genes contained many developmental stress-, light-, and phytohormone-responsive elements in their promoter regions. Distinct expression patterns were observed among the 23 DnTCP genes, suggesting that these genes have diverse regulatory roles at different stages of perianth development or in different organs. For instance, DnTCP4 and DnTCP18 play a role in early perianth development, and DnTCP5 and DnTCP10 are significantly expressed during late perianth development. DnTCP17, 20, 21, and 22 are the most likely to be involved in perianth and leaf development. DnTCP11 was significantly expressed in the gynandrium. Specially, MADS-specific binding sites were present in most DnTCP genes putative promoters, and two Class I DnTCPs were in the nucleus and interacted with each other or with the MADS-box. The interactions between TCP and the MADS-box have been described for the first time in orchids, which broadens our understanding of the regulatory network of TCP involved in perianth development in orchids.

Molecular questioning of potential efficacy of epsilon targeted antiviral treatment option for Domestic Cat Hepadnavirus

Preprint

Full-text available

Apr 2024

We aimed to elucidate the molecular and secondary structure of DCH to predict the development of antiviral drugs. We performed a series of polymerase chain reactions to obtain complete sequences of DCH. The complete sequences were processed using computational tools. The phylogenetic analysis showed that our sequences belong to one clade, but four are not part of this monophyletic clade. A recombination detection program identified four cases as potential recombination events. The secondary structure of the cis-acting RNA region (ε) was evaluated and revealed motifs similar to those found in HBV. This similarity highlights the potential for new-generation therapeutics in this region.

Genome-wide identification of the GAox gene family and functional characterization of PbGA3ox4 during stone cell formation in Chinese white pear

Article

Apr 2024
SCI HORTIC-AMSTERDAM

Atomistic Characterization of Beta-2-Glycoprotein I Domain V Interaction with Anionic Membranes

Preprint

Mar 2024

Background Interaction of beta-2-glycoprotein I ( β 2 GPI) with anionic membranes is crucial in antiphospholipid syndrome (APS), implicating the role of it’s membrane bind-ing domain, Domain V (DV). The mechanism of DV binding to anionic lipids is not fully understood. Objectives This study aims to elucidate the mechanism by which DV of β 2 GPI binds to anionic membranes. Methods We utilized molecular dynamics (MD) simulations to investigate the struc-tural basis of anionic lipid recognition by DV. To corroborate the membrane-binding mode identified in the HMMM simulations, we conducted additional simulations using a full mem-brane model. Results The study identified critical regions in DV, namely the lysine-rich loop and the hydrophobic loop, essential for membrane association via electrostatic and hydrophobic interactions, respectively. A novel lysine pair contributing to membrane binding was also discovered, providing new insights into β 2 GPI’s membrane interaction. Simulations revealed two distinct binding modes of DV to the membrane, with mode 1 characterized by the insertion of the hydrophobic loop into the lipid bilayer, suggesting a dominant mechanism for membrane association. This interaction is pivotal for the pathogenesis of APS, as it facilitates the recognition of β 2 GPI by antiphospholipid antibodies. Conclusion The study advances our understanding of the molecular interactions be-tween β 2 GPI’s DV and anionic membranes, crucial for APS pathogenesis. It highlights the importance of specific regions in DV for membrane binding and reveals a predominant bind-ing mode. These findings have significant implications for APS diagnostics and therapeutics, offering a deeper insight into the molecular basis of the syndrome.

Eureka lemon zinc finger protein ClDOF3.4 interacts with citrus yellow vein clearing virus coat protein to inhibit viral infection1

Article

Mar 2024

MPSA: Integrated system for multiple protein sequence analysis with client/server capabilities

Article

Full-text available

Mar 2000

Unlabelled: MPSA is a stand-alone software intended to protein sequence analysis with a high integration level and Web clients/server capabilities. It provides many methods and tools, which are integrated into an interactive graphical user interface. It is available for most Unix/Linux and non-Unix systems. MPSA is able to connect to a Web server (e.g. http://pbil.ibcp.fr/NPSA) in order to perform large-scale sequence comparison on up-to-date databanks. Availability: Free to academic http://www.ibcp.fr/mpsa/ Contact: c.blanchet@ibcp.fr

A method to predict functional residues in proteins

Article

Full-text available

Mar 1995
Nat Struct Biol

The biological activity of a protein typically depends on the presence of a small number of functional residues. Identifying these residues from the amino acid sequences alone would be useful. Classically, strictly conserved residues are predicted to be functional but often conservation patterns are more complicated. Here, we present a novel method that exploits such patterns for the prediction of functional residues. The method uses a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised 'sequence space'. Projection of these vectors onto a lower-dimensional space reveals groups of residues specific for particular subfamilies that are predicted to be directly involved in protein function. Based on the method we present testable predictions for sets of functional residues in SH2 domains and in the conserved box of cyclins.

ALSCRIPT: a tool to format multiple sequence alignments

Article

Full-text available

Feb 1993

Geoffrey J Barton

The ALSCRIPT program described in this article was developed specifically to allow the easy formatting and graphical display of large multiple alignments. Although written originally for the author's use, the interface is relatively friendly, and should be easy to learn by anyone familiar with plotting graphs

The CLUSTAL_X Windows Interface: Flexible Strategies for Multiple Sequence Alignment Aided by Quality Analysis Tools

Article

Full-text available

Jan 1998

CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees

Article

Jul 1987
MOL BIOL EVOL

A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

The Pfam Protein Families Database

Article

Jan 2000
NUCLEIC ACIDS RES

Alex Bateman

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/ , in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/ . The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.

GeneDoc: Analysis and Visualization of Genetic Variation

Article

Nov 1996

SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny

Article

Jan 1996

Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences

Article

Jul 1987

The prediction of protein secondary structure (alpha-helices, beta-sheets and coil) is improved by 9% to 66% using the information available from a family of homologous sequences. The approach is based both on averaging the Garnier et al. (1978) secondary structure propensities for aligned residues and on the observation that insertions and high sequence variability tend to occur in loop regions between secondary structures. Accordingly, an algorithm first aligns a family of sequences and a value for the extent of sequence conservation at each position is obtained. This value modifies a Garnier et al. prediction on the averaged sequence to yield the improved prediction. In addition, from the sequence conservation and the predicted secondary structure, many active site regions of enzymes can be located (26 out of 43) with limited over-prediction (8 extra). The entire algorithm is fully automatic and is applicable to all structural classes of globular proteins.

SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny

Article

Jan 1997

SEA VIEW and PHYLO_WIN are two graphic tools for X Windows-Unix computers dedicated to sequence alignment and molecular phylogenetics. SEA VIEW is a sequence alignment editor allowing manual or automatic alignment through an interface with CLUSTALW program. Alignment of large sequences with extensive length differences is made easier by a dot-plot-based routine. The PHYLO_WIN program allows phylogenetic tree building according to most usual methods (neighbor joining with numerous distance estimates, maximum parsimony, maximum likelihood), and a bootstrap analysis with any of them. Reconstructed trees can be drawn, edited, printed, stored, evaluated according to numerous criteria. Taxonomic species groups and sets of conserved regions can be defined by mouse and stored into sequence files, thus avoiding multiple data files. Both tools are entirely mouse driven. On-line help makes them easy to use. They are freely available by anonymous ftp at biom3.univ-lyon1.fr/pub/mol_phylogeny or http: //acnuc.univ-lyon1.fr/, or by e-mail to [email protected] /* */

The Jalview Java Alignment Editor

Abstract

Recommended publications

Technical implementation of an Internet address database with online maintenance module

ProtEST: Protein multiple sequence alignments from expressed sequence tags

Bioinformatics

Evaluation and improvement of multiple sequence methods for protein secondary structure prediction