ArticlePDF Available

The Jalview Java Alignment Editor

Authors:

Abstract

Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments. Availability: The Jar file and source code for Jalview is freely available via the World Wide Web at http://www.jalview.org. A Jalview mailing list is also available by e-mailing majordomo{at}sanger.ac.uk with subscribe Jalview in the body of the mail.
BIOINFORMATICS APPLICATIONS NOTE
Vol. 20 no. 3 2004, pages 426–427
DOI: 10.1093/bioinformatics/btg430
The Jalview Java alignment editor
Michele Clamp
1,2,4,
, James Cuff
1,2
, Stephen M. Searle
1,2
and Geoffrey J. Barton
2,3,4
1
The Wellcome Trust Sanger Institute and
2
The European Bioinformatics Institute,
Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK,
3
School of Life Sciences,
University of Dundee, Dow St, Dundee, DD1 5EH, UK and
4
The Wellcome Trust Centre
for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
Received on April 16, 2003; revised on August 4, 2003; accepted on August 6, 2003
Advance Access Publication January 22, 2004
ABSTRACT
Summary: Multiple sequence alignment remains a crucial
method for understanding the function of groups of related
nucleic acid and protein sequences. However, it is known
that automatic multiple sequence alignments can often be
improved by manual editing. Therefore, tools are needed to
view and edit multiple sequence alignments. Due to growth in
the sequence databases, multiple sequence alignments can
often be large and difficult to view efficiently.The Jalview Java
alignment editoris presented here, whichenablesfastviewing
and editing of large multiple sequence alignments.
Availability:TheJarfileandsourcecodeforJalviewisfreely
available via the World Wide Web at http://www.jalview.
org. A Jalview mailing list is also available by e-mailing
majordomo@sanger.ac.uk with subscribe Jalview in the body
of the mail.
Contact: michele@sanger.ac.uk
INTRODUCTION
The alignment of biological sequences has a long history and
the development of automatic techniques has eased the dif-
ficulty of generating alignments from unaligned sequences.
However, even the best multiple sequence alignment methods
only achieve <50% accuracy per position in the alignment
of sequences with <20% identity (Thompson et al., 1999).
Biologists can often use other information about the sequence
and structure of a family of proteins to improve a multiple
sequence alignment. Therefore, biologists striving for the
best possible alignment will often need to edit manually an
automatically generated alignment.
There exist a large number of software packages that allow
the viewing of multiple sequence alignments. These include
Belvu, Alscript (Barton, 1993), ClustalX (Thompson et al.,
1997) and Chroma (Goodstadt and Ponting, 2001). These
packages do not allow editing of multiple sequence align-
ments.Although alignments can be edited in word processing
To whom correspondence should be addressed.
software, such as Microsoft Word or emacs, it is often
difficulttosee conservedpatterns without aspecificcolouring
of the alignment that these programs do not provide. In
addition, specialized multiple sequence alignment editors
can provide extra features for the user including group-
ing and analysis of the conservation patterns in the align-
ment. A small number of software packages exist that allow
editing of multiple sequence alignments, such as Gene-
Doc (Nicholas and Nicholas, 1997, http://www.cris.com/
~Ketchup/genedoc.shtml), BioEdit, Seaview (Galtier et al.,
1996), MPSA(Blanchetetal.,2000),ANTHEPROT(Deleage
et al., 2001) and CINEMA (Parry-Smith et al.,1998) amongst
others. Of these, CINEMA has most similarities with Jalview
as it is written in Java. However, Jalview provides extra func-
tionalitywiththeabilityto calculatetrees, conservationwithin
subfamilies and on the fly pairwise alignments.
The Jalview program was written with the following design
goals in mind. First, it should be platform independent;
second, it should be fast and capable of editing of large mul-
tiple sequence alignments without significant degradation of
performance; and third, it should allow multiple integrated
views of the alignment and other data. These goals were
addressed by coding the software in the platform independent
Java version 1.1 language.
FEATURES OF JALVIEW
Jalview has a rich functionality based on its core alignment
viewing and editing options. These features are described in
outline in the following section. Jalview can input and output
multiple sequence alignmentsin a variety ofcommon formats
includingMSF,aligned FastaandClustalformat. Onceloaded
into Jalview the alignments are coloured by default according
to the ClustalX colouring scheme (Thompson et al., 1997). A
number of other colouring options are available via the edit
menu including a user configurable scheme. If the user does
not have a sequence alignment, a set of unaligned sequences
can be aligned using ClustalW either locally or via the web at
the EBI ClustalW server (Brooksbank et al., 2003).
426 Bioinformatics 20(3) © Oxford University Press 2004; all rights reserved.
at University of Portland on May 24, 2011bioinformatics.oxfordjournals.orgDownloaded from
Jalview Java alignment editor
Editing multiple sequence alignments in Jalview simply
requiresthe userto dragresidues totheleft toremovegapsand
to the right to insert gaps at the cursor position. Editing can
be carried out on multiple sequences by applyinggroup selec-
tion, foundintheeditmenu. Groupingsequencescanspeedup
editing of large numbers of similar sequences. Jalview allows
users to calculate UPGMA or neighbour-joining trees (Saitou
and Nei, 1987). Upon selecting this option, a new window
is opened to display the tree. These trees can be used to re-
orderthe sequences in a multiplealignmentas well as to select
groups of sequences for group editing.
Sequence features on a multiple sequence alignment can
be viewed in Jalview. If the sequence identifiers in the align-
ment are Swiss-Prot/TrEMBL identifiers Jalview can access
the EBI website via SRS to download feature table elements
and display them on the alignment (Brooksbank et al., 2003).
Byright-clickingonSwiss-Prot/TrEMBLsequenceidentifiers
in the alignment window, the entry is retrieved from an SRS
server and displayed in Jalview’s lightweight web-browser. If
a structure is knownfor one of the sequences inthe alignment,
thiscan alsobe downloadedfroman SRSserverand displayed
in the Jalview structure viewer. The colour scheme from the
alignment is projected on to the structure to highlight regions
of conservation.
Principal component analysis (PCA) can help in under-
standing the relationship between sequences of an alignment.
ThemethodofclusteringsequencesimplementedinJalviewis
based on the method applied in SequenceSpace (Casari et al.,
1995). When PCA is selected from the calculate menu a PCA
viewer window is created that shows the sequences projec-
ted on to the first three eigenvectors. Clicking on points in
the PCA window selects the corresponding sequence in the
alignment window and in the tree window if it is visible.
Multiple sequence alignments often contain sub-families of
sequences and applying a colour scheme across the whole
alignment can make it difficult to identify these families.
Jalview allows the user to define sequence groups easily by
using the tree panel. Clicking on the tree defines a max-
imum distance apart any two sequences can be in a group
and the alignment is split into groups accordingly. Conserva-
tion across each group can then be calculated by considering
the different amino acid properties across each column in the
group(Zvelebiletal.,1987). Columnsthataremostconserved
have the most intense colour schemes fading to no colouring
at all for unconserved columns.
The Jalview software was originally written in 1997 and
is now widely used with over 20 000 downloads. It has
been used to produce publication quality alignment figures
as well as to provide a platform independent method to view
multiple sequence alignments by databases, such as Pfam
(Bateman et al., 2002). Jalview is run as an applet via the
Pfam web pages, for an example see http://www.sanger.
ac.uk/Software/Pfam/cgi-bin/getacc.pl?PF00045. Jalview is
also used by the EBI ClustalW server (Brooksbank et al.,
2003) as well as in the Apollo genome annotation editor
(Lewisetal., 2002). The supplementary information available
at http://www.jalview.org/bioinf/supp.html includes a figure
showing a screenshot of the main Jalview windows.
REFERENCES
Barton,G.J. (1993) ALSCRIPT: a tool to format multiple sequence
alignments. Protein Eng., 6, 37–40.
Bateman,A., Birney,E. et al. (2002) The Pfam protein families
database. Nucleic Acids Res., 30, 276–280.
Blanchet,M. et al. (2000) MPSA: integrated system for mul-
tiple protein sequence analysis with client/server capabilities.
Bioinformatics, 16, 286–287.
Brooksbank,C., Camon,E. et al. (2003) The European
Bioinformatics Institute’s data resources. Nucleic Acids
Res., 31, 43–50.
Casari,G., Sander,C. et al. (1995) A method to predict functional
residues in proteins. Nat. Struct. Biol., 2, 171–178.
Galtier,N., Gouy,M. et al. (1996) SEAVIEW and PHYLO_WIN: two
graphic tools for sequence alignment and molecular phylogeny.
Comput. Appl. Biosci., 12, 543–538.
Goodstadt,L. and Ponting,C.P. (2001) CHROMA: consensus-based
colouring of multiple alignments for publication. Bioinformatics,
17, 845–846.
Deleage,G. et al. (2001) ANTHEPROT: an integrated pro-
tein sequence analysis software with client/server capabilities.
Comput. Biol. Med., 31, 259–267.
Lewis,S.E., Searle,S.M. et al. (2002) Apollo: a sequence annotation
editor. Genome Biol., 3, RESEARCH0082.
Nicholas,K.B. and Nicholas,H.B.Jr (1997) GeneDoc: Analysis and
Visualization of Genetic Variation.
Parry-Smith,D.J., Payne,A.W. et al. (1998). CINEMA—a novel
colour interactive editor for multiple alignments. Gene, 221,
GC57–GC63.
Saitou,N. and Nei,M. (1987) The neighbor-joining method: a new
method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4,
406–425.
Thompson,J.D., Gibson,T.J. et al. (1997) The CLUSTAL_X
windows interface: flexible strategies for multiple sequence
alignment aided by quality analysis tools. Nucleic Acids Res.,
25, 4876–4882.
Thompson,J.D., Plewniak,F. et al. (1999) A comprehensive
comparison of multiple sequence alignment programs. Nucleic
Acids Res., 27, 2682–2690.
Zvelebil et al. (1987) Prediction of protein secondary structure and
active sites using the alignment of homologous sequences. J.Mol.
Biol., 195, 957–961.
427
at University of Portland on May 24, 2011bioinformatics.oxfordjournals.orgDownloaded from
... The ClustalW function of Mega 7.0 was used to generate alignment of the CgWOX, Ce-WOX, and CsWOX protein sequences, and the residues were then colored using Jalview [20]. The conserved motifs of the WOX genes in three Cymbidium species were performed and visualized through the MEME online tool [21]. ...
Article
Full-text available
Numerous members of the WOX gene family play pivotal roles during the processes of growth and development in many plants, as has been demonstrated. Cymbidium goeringii, Cymbidium ensifolium, and Cymbidium sinense are ornamental plants with a fascinating floral morphology that are economically important in China. However, there is limited knowledge about the members of the WOX gene family and their functions in these three Cymbidium species. Hence, the WOX genes in three Cymbidium species were identified on the ground of the genomes data of C. goeringii, C. ensifolium, and C. sinense in this study. These identified WOX genes were further studied for their physicochemical properties, evolutionary relationship, gene structure, protein structure, and cis-acting elements of promoters, as well as the expression pattern of the WOX genes in different tissues of C. goeringii. The findings revealed that eight WOX genes in C. goeringii, twelve WOX genes in C. ensifolium, and nine WOX genes were identified. These WOX genes were further subdivided into WUS, ancient, and intermediate clades. The length of the coding region ranged from 149 to 335 aa, and it was predicted that all WOX genes would be located on the cell nucleus. The promoter cis-acting elements primarily comprised stress response, phytohormone response, plant growth and development, and transcription factor elements. Furthermore, both the transcriptomic data and RT-qPCR analysis showed that most WOX genes may be involved in multiple developmental stages of C. goeringii. To sum up, these results may serve as a theoretical foundation for further study of the function analysis of WOX genes in orchids.
... ac. uk/ Tools/ msa/ muscle/) using Jalview 68 . Alignments were used to build phylogenetic trees using iqtree-2.0-rc2 ...
Article
Full-text available
The hematophagous common bed bug, Cimex lectularius, is not known to transmit human pathogens outside laboratory settings, having evolved various immune defense mechanisms including the expression of antimicrobial peptides (AMPs). We unveil three novel prolixicin AMPs in bed bugs, exhibiting strong homology to the prolixicin of kissing bugs, Rhodnius prolixus, and to diptericin/attacin AMPs. We demonstrate for the first time sex-specific and immune mode-specific upregulation of these prolixicins in immune organs, the midgut and rest of body, following injection and ingestion of Gr+ (Bacillus subtilis) and Gr– (Escherichia coli) bacteria. Synthetic CL-prolixicin2 significantly inhibited growth of E. coli strains and killed or impeded Trypanosoma cruzi, the Chagas disease agent. Our findings suggest that prolixicins are regulated by both IMD and Toll immune pathways, supporting cross-talk and blurred functional differentiation between major immune pathways. The efficacy of CL-prolixicin2 against T. cruzi underscores the potential of AMPs in Chagas disease management.
... Multiple sequence alignment was conducted using CLUSTALW (https:// www.genome.jp/tools-bin/clustalw) and visualized with JALVIEW (Clamp et al., 2004). Schematic diagram of PpE18 was generated with Adobe Illustrator. ...
Article
Full-text available
Phytophthora parasitica causes diseases on a broad range of host plants. It secretes numerous effectors to suppress plant immunity. However, only a few virulence effectors in P. parasitica have been characterized. Here, we highlight that PpE18, a conserved RXLR effector in P. parasitica, was a virulence factor and suppresses Nicotiana benthamiana immunity. Utilizing luciferase complementation, co‐immunoprecipitation, and GST pull‐down assays, we determined that PpE18 targeted NbAPX3‐1, a peroxisome membrane‐associated ascorbate peroxidase with reactive oxygen species (ROS)‐scavenging activity and positively regulates plant immunity in N. benthamiana. We show that the ROS‐scavenging activity of NbAPX3‐1 was critical for its immune function and was hindered by the binding of PpE18. The interaction between PpE18 and NbAPX3‐1 resulted in an elevation of ROS levels in the peroxisome. Moreover, we discovered that the ankyrin repeat‐containing protein NbANKr2 acted as a positive immune regulator, interacting with both NbAPX3‐1 and PpE18. NbANKr2 was required for NbAPX3‐1‐mediated disease resistance. PpE18 competitively interfered with the interaction between NbAPX3‐1 and NbANKr2, thereby weakening plant resistance. Our results reveal an effective counter‐defense mechanism by which P. parasitica employed effector PpE18 to suppress host cellular defense, by suppressing biochemical activity and disturbing immune function of NbAPX3‐1 during infection.
... The amino acid sequences of HD-Zips of Chinese white pear and Arabidopsis were extracted and aligned with MAFFT v.7.4 software using default parameters [32,33]. The HD-Zip proteins in pears were aligned and divided into different domains using the Jalview software v.2.10 [34]. ClustalX software was used to verify these results [35,36]. ...
Article
Full-text available
Background The homodomain-leucine zipper (HD-Zip) is a conserved transcription factor family unique to plants that regulate multiple developmental processes including lignificaion. Stone cell content is a key determinant negatively affecting pear fruit quality, which causes a grainy texture of fruit flesh, because of the lignified cell walls. Results In this study, a comprehensive bioinformatics analysis of HD-Zip genes in Chinese white pear (Pyrus bretschneideri) (PbHBs) was performed. Genome-wide identification of the PbHB gene family revealed 67 genes encoding PbHB proteins, which could be divided into four subgroups (I, II, III, and IV). For some members, similar intron/exon structural patterns support close evolutionary relationships within the same subgroup. The functions of each subgroup of the PbHB family were predicted through comparative analysis with the HB genes in Arabidopsis and other plants. Cis-element analysis indicated that PbHB genes might be involved in plant hormone signalling and external environmental responses, such as light, stress, and temperature. Furthermore, RNA-sequencing data and quantitative real-time PCR (RT-qPCR) verification revealed the regulatory roles of PbHB genes in pear stone cell formation. Further, co-expression network analysis revealed that the eight PbHB genes could be classified into different clusters of co-expression with lignin-related genes. Besides, the biological function of PbHB24 in promoting stone cell formation has been demonstrated by overexpression in fruitlets. Conclusions This study provided the comprehensive analysis of PbHBs and highlighted the importance of PbHB24 during stone cell development in pear fruits.
... For the construction of parsimonious trees, the Phylip package (version 3.6) was employed with default parameters. The final alignment figure was generated using Jalview, a software tool introduced by (Clamp et al. 2004). ...
Article
Full-text available
The CRISPR/Cas9 system represents a state-of-the-art technology for precise genome editing in plants. In this study, we performed in silico and evolutionary analyses, as well as designed guide RNA (gRNA) constructs for the precise modification of the thermosensitive genic male sterile (OsTMS5) gene using the CRISPR/Cas9 system in rice (Oryza sativa L.). The OsTMS5 promoter harbours a diverse array of cis-elements, which are linked to light responsiveness, hormonal regulation, and stress-related signaling. Further, expression pattern of OsTMS5 revealed that OsTMS5 exhibited responsiveness to hormones and was activated across diverse tissues and developmental stages in rice. In addition, we meticulously designed gRNA with a length of 20 base pairs. This design process was conducted using the CRISPR-P v2.0 online platform. The target of these gRNAs was the rice OsTMS5 gene. The selection of the top two gRNAs was made after conducting a thorough evaluation, which included assessing factors such as on-score value, minimum off-target score, GC content, potential off-target sites, and genomic location. Furthermore, two types of entry vectors were utilized, and the pMDC99 vector served as the destination vector for plant transformation. Following the annealing and ligation of the gRNAs through LR recombination, the resulting plasmid was named as “pMDC99-eSPCas9 + OsU6-OsTMS5-target1-gRNA + OsU6-OsTMS5-target2-gRNA”. Subsequently, this plasmid obtained from the third LR recombination was introduced into Agrobacterium EHA105 for the purpose of conducting rice transformation. Therefore, these constructs have the potential for use not only in molecular genetic analyses and molecular breeding in rice but also in a wide range of other crop species.
... The multiple protein sequences of identified TCP in D. nobile were aligned using MEGA X, and the alignment results were edited with Jalview (http://www.jalview.org/) (Clamp et al., 2004). The conserved regions of Class I and Class II TCP were submitted to the WebLogo to obtain protein sequence logos (Crooks et al., 2004). ...
Article
Full-text available
TCP is a widely distributed, essential plant transcription factor that regulates plant growth and development. An in-depth study of TCP genes in Dendrobium nobile, a crucial parent in genetic breeding and an excellent model material to explore perianth development in Dendrobium, has not been conducted. We identified 23 DnTCP genes unevenly distributed across 19 chromosomes and classified them as Class I PCF (12 members), Class II: CIN (10 members), and CYC/TB1 (1 member) based on the conserved domain and phylogenetic analysis. Most DnTCPs in the same subclade had similar gene and motif structures. Segmental duplication was the predominant duplication event for TCP genes, and no tandem duplication was observed. Seven genes in the CIN subclade had potential miR319 and -159 target sites. Cis-acting element analysis showed that most DnTCP genes contained many developmental stress-, light-, and phytohormone-responsive elements in their promoter regions. Distinct expression patterns were observed among the 23 DnTCP genes, suggesting that these genes have diverse regulatory roles at different stages of perianth development or in different organs. For instance, DnTCP4 and DnTCP18 play a role in early perianth development, and DnTCP5 and DnTCP10 are significantly expressed during late perianth development. DnTCP17, 20, 21, and 22 are the most likely to be involved in perianth and leaf development. DnTCP11 was significantly expressed in the gynandrium. Specially, MADS-specific binding sites were present in most DnTCP genes putative promoters, and two Class I DnTCPs were in the nucleus and interacted with each other or with the MADS-box. The interactions between TCP and the MADS-box have been described for the first time in orchids, which broadens our understanding of the regulatory network of TCP involved in perianth development in orchids.
Preprint
Full-text available
We aimed to elucidate the molecular and secondary structure of DCH to predict the development of antiviral drugs. We performed a series of polymerase chain reactions to obtain complete sequences of DCH. The complete sequences were processed using computational tools. The phylogenetic analysis showed that our sequences belong to one clade, but four are not part of this monophyletic clade. A recombination detection program identified four cases as potential recombination events. The secondary structure of the cis-acting RNA region (ε) was evaluated and revealed motifs similar to those found in HBV. This similarity highlights the potential for new-generation therapeutics in this region.
Preprint
Background Interaction of beta-2-glycoprotein I ( β 2 GPI) with anionic membranes is crucial in antiphospholipid syndrome (APS), implicating the role of it’s membrane bind-ing domain, Domain V (DV). The mechanism of DV binding to anionic lipids is not fully understood. Objectives This study aims to elucidate the mechanism by which DV of β 2 GPI binds to anionic membranes. Methods We utilized molecular dynamics (MD) simulations to investigate the struc-tural basis of anionic lipid recognition by DV. To corroborate the membrane-binding mode identified in the HMMM simulations, we conducted additional simulations using a full mem-brane model. Results The study identified critical regions in DV, namely the lysine-rich loop and the hydrophobic loop, essential for membrane association via electrostatic and hydrophobic interactions, respectively. A novel lysine pair contributing to membrane binding was also discovered, providing new insights into β 2 GPI’s membrane interaction. Simulations revealed two distinct binding modes of DV to the membrane, with mode 1 characterized by the insertion of the hydrophobic loop into the lipid bilayer, suggesting a dominant mechanism for membrane association. This interaction is pivotal for the pathogenesis of APS, as it facilitates the recognition of β 2 GPI by antiphospholipid antibodies. Conclusion The study advances our understanding of the molecular interactions be-tween β 2 GPI’s DV and anionic membranes, crucial for APS pathogenesis. It highlights the importance of specific regions in DV for membrane binding and reveals a predominant bind-ing mode. These findings have significant implications for APS diagnostics and therapeutics, offering a deeper insight into the molecular basis of the syndrome.
Article
Full-text available
Unlabelled: MPSA is a stand-alone software intended to protein sequence analysis with a high integration level and Web clients/server capabilities. It provides many methods and tools, which are integrated into an interactive graphical user interface. It is available for most Unix/Linux and non-Unix systems. MPSA is able to connect to a Web server (e.g. http://pbil.ibcp.fr/NPSA) in order to perform large-scale sequence comparison on up-to-date databanks. Availability: Free to academic http://www.ibcp.fr/mpsa/ Contact: c.blanchet@ibcp.fr
Article
Full-text available
The biological activity of a protein typically depends on the presence of a small number of functional residues. Identifying these residues from the amino acid sequences alone would be useful. Classically, strictly conserved residues are predicted to be functional but often conservation patterns are more complicated. Here, we present a novel method that exploits such patterns for the prediction of functional residues. The method uses a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised 'sequence space'. Projection of these vectors onto a lower-dimensional space reveals groups of residues specific for particular subfamilies that are predicted to be directly involved in protein function. Based on the method we present testable predictions for sets of functional residues in SH2 domains and in the conserved box of cyclins.
Article
Full-text available
The ALSCRIPT program described in this article was developed specifically to allow the easy formatting and graphical display of large multiple alignments. Although written originally for the author's use, the interface is relatively friendly, and should be easy to learn by anyone familiar with plotting graphs
Article
Full-text available
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/ , in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/ . The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.
Article
The prediction of protein secondary structure (alpha-helices, beta-sheets and coil) is improved by 9% to 66% using the information available from a family of homologous sequences. The approach is based both on averaging the Garnier et al. (1978) secondary structure propensities for aligned residues and on the observation that insertions and high sequence variability tend to occur in loop regions between secondary structures. Accordingly, an algorithm first aligns a family of sequences and a value for the extent of sequence conservation at each position is obtained. This value modifies a Garnier et al. prediction on the averaged sequence to yield the improved prediction. In addition, from the sequence conservation and the predicted secondary structure, many active site regions of enzymes can be located (26 out of 43) with limited over-prediction (8 extra). The entire algorithm is fully automatic and is applicable to all structural classes of globular proteins.
Article
SEA VIEW and PHYLO_WIN are two graphic tools for X Windows-Unix computers dedicated to sequence alignment and molecular phylogenetics. SEA VIEW is a sequence alignment editor allowing manual or automatic alignment through an interface with CLUSTALW program. Alignment of large sequences with extensive length differences is made easier by a dot-plot-based routine. The PHYLO_WIN program allows phylogenetic tree building according to most usual methods (neighbor joining with numerous distance estimates, maximum parsimony, maximum likelihood), and a bootstrap analysis with any of them. Reconstructed trees can be drawn, edited, printed, stored, evaluated according to numerous criteria. Taxonomic species groups and sets of conserved regions can be defined by mouse and stored into sequence files, thus avoiding multiple data files. Both tools are entirely mouse driven. On-line help makes them easy to use. They are freely available by anonymous ftp at biom3.univ-lyon1.fr/pub/mol_phylogeny or http: //acnuc.univ-lyon1.fr/, or by e-mail to [email protected] /* */