Fig 2 - uploaded by Server Kasap
Content may be subject to copyright.
Unrooted phylogenetic tree 

Unrooted phylogenetic tree 

Source publication
Article
Full-text available
We present in this paper the detailed field-programmable gate-array (FPGA) design of the Maximum Parsimony method for molecular-based phylogenetic analysis and its implementation on the nodes of an FPGA supercomputer called Maxwell. This is the first FPGA implementation of this method for nucleotide sequence data reported in the literature. The har...

Contexts in source publication

Context 1
... tree topologies grows exponentially with the number of species under consideration. For instance, it takes over 30 hours to construct the phylogenetic tree for 12 species. Hence, it is mandatory to utilize faster computing platforms such as Field Programmable Gate Arrays (FPGAs). These have indeed been recently proposed as an efficacious and efficient implementation platform for phylogenetic analysis due to their flexible computing and memory architecture which gives them ASIC-like performance with the added programmability feature [3] [4] [5] [6] [7] [8] [9][10] [11]. Hence, we chose FPGAs over ASICs because of their reconfigurability feature and shorter development time which results in lower nonrecurring engineering (NRE) costs. There are various phylogenetic tree construction and phylogenetic analysis methods using different strategies. In this paper, we concentrate on the Maximum Parsimony (MP) method which is one of the most widely used and most accurate tree construction method [2].The design and implementation of the FPGA core for parsimony analysis employing Sankoff’s dynamic programming algorithm is presented in this paper. Systolic array architecture was selected in our design due to its several benefits for our design. First of all, systolic structures have inherently massive, local, parallelism potential at both coarse and fine-grain levels. Coarse-grain parallelism is through the number of parallel processing elements, whereas the fine-grain parallelism is achieved in each processing element. This is the main reason behind the accomplished high speed-up values. Furthermore, since only the processing element at the border of the array can communicate with the host, communications in the architecture are mostly local (i.e. between and within the processing elements). Hence, communication paths have short delays resulting in high clock frequencies and consequently, high throughput. Moreover, systolic architectures can be easily implemented on FPGAs as demonstrated in the literature. A real hardware implementation of the designed core was achieved on the nodes of an FPGA supercomputer, named Maxwell, which consists of 64 Virtex-4 FPGA chips. To our knowledge, this is the first FPGA implementation of this method for nucleotide sequence data ever reported in the literature. FPGA implementations of other phylogenetic analysis methods and different molecular data have been reported in the past, however, as described in more detail in section III. The remainder of this paper will first present essential background information on phylogenetic analysis and then discuss related prior works in the literature. Following this, the Maximum Parsimony (MP) method for molecular based phylogenetic tree construction will be detailed. After that, the architecture of the Maxwell FPGA supercomputer will be illustrated. Then, the design and implementation of our FPGA core for the MP method will be elaborated. Following this, implementation results are presented and then evaluated comparatively with equivalent software implementations running on a desktop computer. Finally, conclusions are laid out with plans for future work. Evolution and relationships among organisms can be investigated in different ways. Although morphology is the classic method of estimating relationships, continuously growing molecular information such as nucleotide or amino acid sequences can also be utilized to infer evolutionary relatedness. Molecular-based phylogenetic analysis estimates the relationship between species by inferring the common history of their genes through comparing homologous sites with each other. For this reason, sequences under investigation are multiply aligned by some specific algorithms so that homologous sites form columns in the alignment. These alignments are used to construct phylogenetic trees which illustrate evolutionary relationships among genes and organisms. Diagrams depicting the relationship of species resemble the structure of a tree. Hence, they are called phylogenetic trees. There are two types of phylogenetic tree, rooted or unrooted. Rooted phylogenetic trees are drawn with a root to the left. Fig. 1 shows an example rooted phylogenetic tree where the root node is indicated. It can be seen that phylogenetic trees are strictly bifurcated (binary). Phylogenetic trees have some number of External (Terminal) nodes which are often called operational taxonomic units (OTUs). OTUs represent existing taxa (i.e. a group of one or more organisms). For instance, B, D, E, A and C are all terminal nodes in the phylogenetic tree shown in Fig. 1. Also, phylogenetic trees have some number of internal nodes which are called hypothetical taxonomic units (HTUs). HTUs represent hypothetical ancestors of OTUs. Nodes other than root and terminal nodes are internal nodes in phylogenetic tree as shown in Fig. 1. Furthermore, the lines between the nodes are branches. The branching pattern is called the topology of the tree. Fig. 2 shows an example unrooted phylogenetic tree. An unrooted phylogenetic tree does not indicate the direction of evolution process as seen in Fig. 2 since it is not known which node represents the ancestor of all OTUs. However, in a rooted tree, there is a root node which leads to the common ancestor of all OTUs in it. In Fig. 1, arrows indicate the direction of evolution from root to terminal node E for instance. Note that an unrooted phylogenetic tree can be rooted with a method named outgroup rooting if a set of the most distantly related OTUs (i.e. outgroup) can be formed. Otherwise, the midpoint rooting method can be utilized. Both of these methods are described in detail in [1]. There are various methods to generate phylogenetic trees from nucleotide acid sequence alignments in molecular data based phylogenetic analysis. All of these methods use certain evolutionary assumptions. If these assumptions apply to the date set, the methods perform well. These methods can be grouped in one way according to whether they use discrete character states or pairwise distance matrices. Character-state methods regard each position in the aligned sequences as a character and the nucleotides and amino acids at that position as states. All characters are compared separately and independently from each other. One advantage of these methods is that they can reconstruct the character state of the internal nodes which represent ancestral taxa. On the other hand, distance-matrix methods produce a pairwise distance matrix and then infer relationships of the OTUs from that matrix. Although distance-matrix methods can not reconstruct the character state of ancestral nodes like character-state methods, they are much less computer- intensive, and hence faster. Molecular based phylogenetic analysis methods can also be grouped according to whether they consider all possible trees or cluster OTUs stepwise to obtain the single best tree. Exhaustive-search methods evaluate all theoretically possible tree topologies for a given number of OTUs using a certain criteria and choose the best one as true phylogeny. One advantage of these methods is that it is possible to assess the confidence in the best tree obtained by comparing it with the second best tree. However, the number of possible trees grows exponentially as the number of taxa increases. Hence, these methods require very high computing power. On the other hand, stepwise-clustering constructs a single tree by following specific clustering algorithms. Hence, these methods can cope with large numbers of OTUs. However, there is no way to estimate the confidence in correctness of a tree obtained since only one tree is produced in these methods. Table I below lists phylogenetic tree construction and phylogenetic analysis methods classified according to the strategy they use. Note that most of the distance-matrix methods utilize stepwise clustering to construct the best tree whereas all character-state methods search the tree space exhaustively to find the best tree. In this work, a discrete character method widely used in molecular phylogenetic analysis, namely the maximum parsimony (MP) method, was employed to find the best phylogenetic tree for a given number of taxa where all theoretically possible tree topologies are evaluated. There are some faster heuristic approaches to this method, however, which attempt to heuristically find optimal solutions to the best tree topology problem [1]. Although these approaches have shorter run times in software, they are approximate and hence do not guarantee to find the best tree topology. With faster implementation platforms, however, this compromise need not take place, and that is why we have chosen to accelerate the MP method with exhaustive search on FPGA hardware in this work. Although the FPGA implementation of the maximum parsimony (MP) phylogenetic tree construction for nucleotide sequence data has never been reported in the literature, there exist some papers discussing the hardware implementations of the other phylogenetic analysis methods for different types of molecular data. For instance, [3], [4] and [5] describe the design of FPGA-based coprocessor architecture to accelerate the reconstruction of MP phylogenies for gene-arrangement data. The design performs a parallelized version of the breakpoint median computation which is the most time consuming component of the reconstruction. Reference [3] reports that the breakpoint median hardware core achieves a 1005x speed-up over the related desktop software solely for the computation and a 417x speed-up when the architecture is used to accelerate the entire reconstruction procedure. Moreover, [6], [7] and [8] present high performance FPGA implementations for tackling the tree evaluation process for nucleotide sequences under the Maximum Likelihood (ML) criterion in order to speed-up the tree reconstruction. Reference [8] proposes a Hardware/Software (HW/SW) system for solving ...
Context 2
... tree for 12 species. Hence, it is mandatory to utilize faster computing platforms such as Field Programmable Gate Arrays (FPGAs). These have indeed been recently proposed as an efficacious and efficient implementation platform for phylogenetic analysis due to their flexible computing and memory architecture which gives them ASIC-like performance with the added programmability feature [3] [4] [5] [6] [7] [8] [9][10] [11]. Hence, we chose FPGAs over ASICs because of their reconfigurability feature and shorter development time which results in lower nonrecurring engineering (NRE) costs. There are various phylogenetic tree construction and phylogenetic analysis methods using different strategies. In this paper, we concentrate on the Maximum Parsimony (MP) method which is one of the most widely used and most accurate tree construction method [2].The design and implementation of the FPGA core for parsimony analysis employing Sankoff’s dynamic programming algorithm is presented in this paper. Systolic array architecture was selected in our design due to its several benefits for our design. First of all, systolic structures have inherently massive, local, parallelism potential at both coarse and fine-grain levels. Coarse-grain parallelism is through the number of parallel processing elements, whereas the fine-grain parallelism is achieved in each processing element. This is the main reason behind the accomplished high speed-up values. Furthermore, since only the processing element at the border of the array can communicate with the host, communications in the architecture are mostly local (i.e. between and within the processing elements). Hence, communication paths have short delays resulting in high clock frequencies and consequently, high throughput. Moreover, systolic architectures can be easily implemented on FPGAs as demonstrated in the literature. A real hardware implementation of the designed core was achieved on the nodes of an FPGA supercomputer, named Maxwell, which consists of 64 Virtex-4 FPGA chips. To our knowledge, this is the first FPGA implementation of this method for nucleotide sequence data ever reported in the literature. FPGA implementations of other phylogenetic analysis methods and different molecular data have been reported in the past, however, as described in more detail in section III. The remainder of this paper will first present essential background information on phylogenetic analysis and then discuss related prior works in the literature. Following this, the Maximum Parsimony (MP) method for molecular based phylogenetic tree construction will be detailed. After that, the architecture of the Maxwell FPGA supercomputer will be illustrated. Then, the design and implementation of our FPGA core for the MP method will be elaborated. Following this, implementation results are presented and then evaluated comparatively with equivalent software implementations running on a desktop computer. Finally, conclusions are laid out with plans for future work. Evolution and relationships among organisms can be investigated in different ways. Although morphology is the classic method of estimating relationships, continuously growing molecular information such as nucleotide or amino acid sequences can also be utilized to infer evolutionary relatedness. Molecular-based phylogenetic analysis estimates the relationship between species by inferring the common history of their genes through comparing homologous sites with each other. For this reason, sequences under investigation are multiply aligned by some specific algorithms so that homologous sites form columns in the alignment. These alignments are used to construct phylogenetic trees which illustrate evolutionary relationships among genes and organisms. Diagrams depicting the relationship of species resemble the structure of a tree. Hence, they are called phylogenetic trees. There are two types of phylogenetic tree, rooted or unrooted. Rooted phylogenetic trees are drawn with a root to the left. Fig. 1 shows an example rooted phylogenetic tree where the root node is indicated. It can be seen that phylogenetic trees are strictly bifurcated (binary). Phylogenetic trees have some number of External (Terminal) nodes which are often called operational taxonomic units (OTUs). OTUs represent existing taxa (i.e. a group of one or more organisms). For instance, B, D, E, A and C are all terminal nodes in the phylogenetic tree shown in Fig. 1. Also, phylogenetic trees have some number of internal nodes which are called hypothetical taxonomic units (HTUs). HTUs represent hypothetical ancestors of OTUs. Nodes other than root and terminal nodes are internal nodes in phylogenetic tree as shown in Fig. 1. Furthermore, the lines between the nodes are branches. The branching pattern is called the topology of the tree. Fig. 2 shows an example unrooted phylogenetic tree. An unrooted phylogenetic tree does not indicate the direction of evolution process as seen in Fig. 2 since it is not known which node represents the ancestor of all OTUs. However, in a rooted tree, there is a root node which leads to the common ancestor of all OTUs in it. In Fig. 1, arrows indicate the direction of evolution from root to terminal node E for instance. Note that an unrooted phylogenetic tree can be rooted with a method named outgroup rooting if a set of the most distantly related OTUs (i.e. outgroup) can be formed. Otherwise, the midpoint rooting method can be utilized. Both of these methods are described in detail in [1]. There are various methods to generate phylogenetic trees from nucleotide acid sequence alignments in molecular data based phylogenetic analysis. All of these methods use certain evolutionary assumptions. If these assumptions apply to the date set, the methods perform well. These methods can be grouped in one way according to whether they use discrete character states or pairwise distance matrices. Character-state methods regard each position in the aligned sequences as a character and the nucleotides and amino acids at that position as states. All characters are compared separately and independently from each other. One advantage of these methods is that they can reconstruct the character state of the internal nodes which represent ancestral taxa. On the other hand, distance-matrix methods produce a pairwise distance matrix and then infer relationships of the OTUs from that matrix. Although distance-matrix methods can not reconstruct the character state of ancestral nodes like character-state methods, they are much less computer- intensive, and hence faster. Molecular based phylogenetic analysis methods can also be grouped according to whether they consider all possible trees or cluster OTUs stepwise to obtain the single best tree. Exhaustive-search methods evaluate all theoretically possible tree topologies for a given number of OTUs using a certain criteria and choose the best one as true phylogeny. One advantage of these methods is that it is possible to assess the confidence in the best tree obtained by comparing it with the second best tree. However, the number of possible trees grows exponentially as the number of taxa increases. Hence, these methods require very high computing power. On the other hand, stepwise-clustering constructs a single tree by following specific clustering algorithms. Hence, these methods can cope with large numbers of OTUs. However, there is no way to estimate the confidence in correctness of a tree obtained since only one tree is produced in these methods. Table I below lists phylogenetic tree construction and phylogenetic analysis methods classified according to the strategy they use. Note that most of the distance-matrix methods utilize stepwise clustering to construct the best tree whereas all character-state methods search the tree space exhaustively to find the best tree. In this work, a discrete character method widely used in molecular phylogenetic analysis, namely the maximum parsimony (MP) method, was employed to find the best phylogenetic tree for a given number of taxa where all theoretically possible tree topologies are evaluated. There are some faster heuristic approaches to this method, however, which attempt to heuristically find optimal solutions to the best tree topology problem [1]. Although these approaches have shorter run times in software, they are approximate and hence do not guarantee to find the best tree topology. With faster implementation platforms, however, this compromise need not take place, and that is why we have chosen to accelerate the MP method with exhaustive search on FPGA hardware in this work. Although the FPGA implementation of the maximum parsimony (MP) phylogenetic tree construction for nucleotide sequence data has never been reported in the literature, there exist some papers discussing the hardware implementations of the other phylogenetic analysis methods for different types of molecular data. For instance, [3], [4] and [5] describe the design of FPGA-based coprocessor architecture to accelerate the reconstruction of MP phylogenies for gene-arrangement data. The design performs a parallelized version of the breakpoint median computation which is the most time consuming component of the reconstruction. Reference [3] reports that the breakpoint median hardware core achieves a 1005x speed-up over the related desktop software solely for the computation and a 417x speed-up when the architecture is used to accelerate the entire reconstruction procedure. Moreover, [6], [7] and [8] present high performance FPGA implementations for tackling the tree evaluation process for nucleotide sequences under the Maximum Likelihood (ML) criterion in order to speed-up the tree reconstruction. Reference [8] proposes a Hardware/Software (HW/SW) system for solving the tree reconstruction problem using the Genetic algorithm for Maximum Likelihood (GAML) approach which yields speed-up of 30x to 100x compared ...

Similar publications

Conference Paper
Full-text available
A dramatic improvement in energy efficiency is mandatory for sustainable supercomputing and has been identified as a major challenge. Affordable energy solution continues to be of great concern in the development of the next generation of supercomputers. Low power processors, dynamic control of processor frequency and heterogeneous systems are bein...
Conference Paper
Full-text available
The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyp...
Conference Paper
Full-text available
Fourier Domain Optical Coherence Tomography (FD-OCT) is an emerging biomedical imaging technology featuring ultra-high resolution and fast imaging speed. Due to the complexity of the FD-OCT algorithm, real time FD-OCT imaging demands high performance computing platforms. However, the scaling of real-time FD-OCT processing for increasing data acquis...

Citations

... Genes encoding proteins have some regions that are very well conserved, because of their structure or molecular functions, while other regions evolve faster in terms of nucleotide substitutions and insertions or deletions (Watson et al., 2005). To understand the role of proteins within groups of organisms, phylogenetic studies can help to clarify questions about how proteins are related in different species, and whether they may have evolved from a common ancestor (Kasap et al., 2010;Andrade et al., 2011). Phylogenetic results are currently continuously improving due to the increasing availability of a large amount of biological data and new approaches and methods of analysis (Kasap et al., 2010;Andrade et al., 2011). ...
... To understand the role of proteins within groups of organisms, phylogenetic studies can help to clarify questions about how proteins are related in different species, and whether they may have evolved from a common ancestor (Kasap et al., 2010;Andrade et al., 2011). Phylogenetic results are currently continuously improving due to the increasing availability of a large amount of biological data and new approaches and methods of analysis (Kasap et al., 2010;Andrade et al., 2011). ...
... Protein-based phylogenetic reconstructions have been widely used to elucidate the role of proteins within clusters of organisms. Phylogenetic studies may help to clarify questions about how proteins are related in different species, and whether they may have evolved from a common ancestor (Kasap et al., 2010;Andrade et al., 2011;Van Holle et al., 2017;Moraes Filho et al., 2016). Phylogenies based on some lectin families demonstrate similarity with the phylogeny of angiosperms (Van Holle et al., 2017). ...
Article
Full-text available
The Allium genus stands out for its uses in human food and also for its medicinal properties. Many representatives of the Amaryllidaceae family are known for producing mannose binding lectins (MBL). In plants, lectins act as reserves of proteins that can be used for plant growth and development and also in defense against herbivores and pathogens, being toxic to some aphids and sucking insects. We examined physicochemical characteristics, such as isoelectric points and hydropathicity, of 22 sequences of MBL protein from Allium species and from other representatives of the Amaryllidaceae family present in public databases. Phylogenetic analysis, identification of functional domains and 3D homology modeling were also performed. We found two conserved functional motifs in the MBL sequences. It was observed that for all species the MBL had a hydrophilic character and great variation in isoelectric points. The phylogenetic analysis was not consistent with the taxonomic classification of the species evaluated at the infrageneric level. However, the methods proved efficient for the separation up to ©FUNPEC-RP www.funpecrp.com.br Genetics and Molecular Research 18 (2): gmr18187 H.J. Jimenez et al. 2 the level of tribes within the Amaryllidaceae family. The generated 3D models also provide a better understanding of their tertiary structures and molecular functions.
... Hence, FPGA hardware approaches have been proposed to reduce the execution time. For example, in some previous works [8]- [11], accelerations for the maximum parsimony problem were proposed. However, in this previous work [8], the approach is based on the evaluation of all possible tress, and is limited to a number of only 12 taxa. ...
... For example, in some previous works [8]- [11], accelerations for the maximum parsimony problem were proposed. However, in this previous work [8], the approach is based on the evaluation of all possible tress, and is limited to a number of only 12 taxa. In this other previous work [9], the approach is not restricted by the number of taxa, but it only addresses the parsimony function, not the whole search algorithm. ...
Article
In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel and pipeline processing. We show results for eight real-world biological datasets. We compare execution times against our previous hardware approach, and TNT, the fastest available parsimony program, which is also accelerated by the Indirect Calculation of Tree Lengths method. Acceleration rates between 34 to 45 per rearrangement, and 2 to 6 for the whole search, are obtained against our previous hardware approach. Acceleration rates between 2 to 36 per rearrangement, and 18 to 112 for the whole search, are obtained against TNT. © 2017 The Institute of Electronics, Information and Communication Engineers.
... Reconfigurable computing has been used to solve exactly MP problem (Kasap and Benkrid, 2011), but it supports instances of maximum 12 taxa. The exact methods are limited by the number of taxa which is relatively small. ...
... Reconfigurable computing has been used to solve exactly MP problem (Kasap and Benkrid, 2011), but it supports instances of maximum 12 taxa. The exact methods are limited by the number of taxa which is relatively small. ...
Conference Paper
Full-text available
The phylogenetic reconstruction is considered a central underpinning of diverse field of biology like: ecology, molecular biology and physiology. The main example is modeling patterns and processes of evolution. Maximum Parsimony (MP) is an important approach to solve the phylogenetic reconstruction by minimizing the total number of genetic transformations, under this approach different metaheuristics have been implemented like tabu search, genetic and memetic algorithms to cope with the combinatorial nature of the problem. In this paper we review different strategies that could be added to existing implementations to improve their efficiency and accuracy. First we present two different techniques to evaluate the objective function by using CPU and GPU technology, then we show a Path-Relinking implementation to compare tree topologies and finally we introduces the application of these techniques in a Simulated Annealing algorithm looking for an optimal solution
... Finding the optimal solution to the maximum parsimony problem is NP-hard (Graham, 1982). The research on exact maximum parsimony searches has resulted in two methods (Kasap & Benkrid, 2011; White & Holland, 2011). Although these methods make use of parallel hardware, they are applicable only on small datasets (<40 taxa). ...
Article
Full-text available
Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000-8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under: http://algbio.cs.uni-duesseldorf.de/webapps/wa-download/.
... FPGAs have been successfully applied in the acceleration of many bioinformatics applications such as in bio-sequence alignment, phylogenetic analysis, and molecular dynamics simulation [26]- [27] and [44]- [46]. The reader is advised to consult the aforementioned references for details about those applications. ...
Thesis
Full-text available
The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised K-means clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speed-up. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis.
... Kasap and Benkrid [14], [15] recently presented, theto the best of our knowledge-first reconfigurable architecture for the parsimony kernel and assessed performance on a FPGA supercomputer by exploiting fine-grain and coarse-grain parallelism. The implementation is limited to trees with a maximum of 12 organisms, which are very small by todays standards; the largest published parsimony-based tree has 73,060 taxa [1]. ...
... Parsimony-based programs for large datasets deploy heuristic search strategies (e.g., Subtree Pruning and ReGrafting (SPR) or Tree Bisection and Reconnection (TBR)). These search strategies (as implemented for instance, in TNT, parsimonator (our code), or PAUP * ) do not require a de-novo computation of the parsimony score, based on a full post-order tree traversal as implemented in [14], [15]. Instead, they only require the update of a comparatively small fraction of ancestral parsimony vectors. ...
Conference Paper
Full-text available
The phylogenetic parsimony function is a popular, discrete criterion for reconstructing evolutionary trees based on molecular sequence data. Parsimony strives to find the phylogenetic tree that explains the evolutionary history of organisms by the least number of mutations. Because parsimony is a discrete function, it should fit well to FPGAs. We present a versatile FPGA implementation of the parsimony function and compare its performance to a highly optimized SSE3- and AVX-vectorized software implementation. We find that, because of a particular constellation in our lab, the speedups that can be achieved by using an FPGA, are substantially less impressive, than usually reported in papers on FPGA acceleration of bioinformatics kernels. We conclude that, a competitive spirit between SW and HW application developers can contribute toward obtaining more objective performance comparisons.
... Therefore, with some exceptions like [4] or [5], most of the application specific im-plementations remain more focused on computations while on-chip memory is only used for lookup-data or to stream data through simple FIFOs. This is why if we look at various FPGA based implementations of web applications [3], [4], sequence alignment algorithms [6], [7], signal processing kernels [2], [8], [9], [10], [5], [11] and many others, we will observe almost no harmony between the memory layouts used for each implementation. ...
Conference Paper
Full-text available
FPGA devices are mostly utilized for customized application designs with heavily pipelined and aggressively parallel computations. However, little focus is normally given to the FPGA memory organizations to efficiently use the data fetched into the FPGA. This work presents a Front End Memory (FEM) layout based on BRAMs and Distributed RAM for FPGA-based accelerators. The presented memory layout serves as a template for various data organizations which is in fact a step towards the standardization of a methodology for FPGA based memory management inside an accelerator. We present example application kernels implemented as specializations of the template memory layout. Further, the presented layout can be used for Spatially Mapped-Shared Memory multi-kernel applications targeting FPGAs. This fact is evaluated by mapping two applications, an Acoustic Wave Equation code and an N-Body method, to three multi-kernel execution models on a Virtex-4 L×200 device. The results show that the shared memory model for Acoustic Wave Equation code outperforms the local and runtime reconfigured models by 1.3-1.5×, respectively. For the N-Body method the shared model is slightly more efficient with a small number of bodies, but for larger systems the runtime reconfigured model shows a 3× speedup over the other two models.
Article
The COVID-19 pandemic brought Bioinformatics into the spotlight, revealing that several existing methods, algorithms and tools were not well prepared to handle large amounts of genomic data efficiently. This led to prohibitively long execution times and the need to reduce the extent of analyses in order to obtain results in a reasonable amount of time. In this survey, we review available high-performance computing and hardware-accelerated systems based on FPGA and GPU technology. Optimized and hardware accelerated systems can conduct more thorough analyses considerably faster than pure software implementations, allowing to reach important conclusions in a timely manner to drive scientific discoveries. We discuss the reasons that are currently hindering high-performance solutions from being widely deployed in real-world biological analyses, and describe a research direction that can pave the way to enable this.