Prototype search engine.

Source publication

Massively Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching.

Conference Paper

Full-text available

Jan 2004

Data mining is an application that is commonly ex- ecuted on massively parallel systems, often using clus- ters with hundreds of processors. With a disk-based data store, however, the data must first be delivered to the processors before effective mining can take place. Here, we describe the prototype of an experimental sys- tem that moves processi...

Context 1

... FPX-based search engine is tapped into an off- the-shelf ATA hard drive as shown in Figure 5. When the hard drive is sending data to the host system, the FPX gets a copy at the same time through a custom printed circuit board (providing termination and volt- age conversion) and the test pins of the FPX board. ...

View in full-text

Context 2

... we compare the performance of the software and hardware versions of the search engine. Essentially, we will be using measurements on the ex- perimental system of Figure 5 to gain insight into the performance of of the target system of Figure 2. Figure 7 shows a simplified time line for reading a single hard disk. When the disk controller receives the read command from the host, and the data to be re- trieved are not in the controller cache, it instructs the magnetic head to move to the right data track (a seek) and waits for a rotational latency for the data sector to rotate under the head. ...

View in full-text

High-Throughput Pattern Matching With CMOL FPGA Circuits: Case for Logic-in-Memory Computing

Article

Full-text available

Mar 2018
IEEE T VLSI SYST

In this paper, we propose a novel CMOS+MOLecular (CMOL) field-programmable gate array (FPGA) circuit architecture to perform massively parallel, high-throughput computations, which is especially useful for pattern matching tasks and multidimensional associative searches. In the new architecture, patterns are stored as resistive states of emerging nonvolatile memory nanodevices, while the analyzed data are streamed via CMOS subsystem. The main improvements over prior work offered by the proposed circuits are increased nanodevice utilization and, as a result, substantially higher throughput, which is demonstrated by a detailed analysis of the implementation of pattern matching task on the new architecture. For example, our estimates show that the proposed CMOL FPGA circuits based on the 22-nm CMOS technology and one crossbar layer with 22-nm nanowire half-pitch allows up to 12.5% average nanodevice utilization, i.e., the fraction of the devices turned to the high conductive state, as compared to a typical ~0.1% of the original CMOL FPGA circuits. This in turn enables throughput close to 7.1x10¹⁶ bits/s/cm² at ~1 fJ/bit energy efficiency, for matching of ~10⁷ 250-bit patterns stored locally on a 1 cm² chip. These numbers are at least 2 orders of magnitude better throughput as compared to that of other state-of-the-art FPGA methods, and begin to approach ternary content-addressable memory -like performance at similar CMOS technology nodes. More generally, we argue that the proposed concept combines the versatility of reconfigurable architectures and density of the associative memories. It can be viewed as a very tight symbiotic integration of memory and logic functions for high-performance logic-in-memory computing.

An efficient pruning strategy for approximate string matching over suffix tree

Article

Full-text available

Oct 2016
KNOWL INF SYST

Approximate string matching over suffix tree with depth-first search (ASM_ST_DFS), a classical algorithm in the field of approximate string matching, was originally proposed by Ricardo A. Baeza-Yates and Gaston H. Gonnet in 1990. The algorithm is one of the most excellent algorithms for approximate string matching if combined with other indexing techniques. However, its time complexity is sensitive to the length of pattern string because it searches (Formula presented.) characters on each path from the root before backtracking. In this paper, we propose an efficient pruning strategy to solve this problem. We prove its correctness and efficiency in theory. Particularly, we proved that if the pruning strategy is adopted, it averagely searches O(k) characters on each path before backtracking instead of O(m). Considering each internal node of suffix tree has multiple branches, the pruning strategy should work very well. We also experimentally show that when k is much smaller than m, the efficiency improves hundreds of times, and when k is not much smaller than m, it is still several times faster. This is the first paper that tries to solve the backtracking problem of ASM_ST_DFS in both theory and practice.

A Comparative Study of Wu Manber String Matching Algorithm and its Variations

Article

Dec 2015

FPGA based string matching for network processing applications

Article

Jun 2008
MICROPROCESS MICROSY

String matching is a key problem in many network processing applications. Current implementations of this process using software are time consuming and cannot meet gigabit bandwidth requirements. Implementing this process in hardware improves the search time considerably and has several other advantages. This paper presents an array based hardware implementation of this time consuming process for network intrusion detection and directory lookup applications using reconfigurable hardware. These designs are coded in VHDL targeting a Xilinx Virtex-II Pro FPGA and are evaluated in terms of the speed and resource utilization.

Direct-Attached Disk Subsystem Performance Assessment

Conference Paper

Oct 2007

Direct-attached storage has historically had the reputation of being less capable than equivalently sized SAN installations. Here, we empirically demonstrate the performance achievable in multiple-terabyte, direct-attached disk subsystems. A number of parameters are explored, including file system, number of logical drives, and RAID configuration.

Article

Feb 2007

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems

Conference Paper

May 2006

The Apriori algorithm is a fundamental correlation-based data mining kernel used in a variety of fields. The innovation in this paper is a highly parallel custom architecture implemented on a reconfigurable computing system. Using this "bitmapped CAM," the time and area required for executing the subset operations fundamental to data mining can be significantly reduced. The bitmapped CAM architecture implementation on an FPGA-accelerated high performance workstation provides a performance acceleration of orders of magnitude over software-based systems. The bitmapped CAM utilizes redundancy within the candidate data to efficiently store and process many subset operations simultaneously. The efficiency of this operation allows 140 units to process about 2,240 subset operations simultaneously. Using industry-standard benchmarking databases, we have tested the bitmapped CAM architecture and shown the platform provides a minimum of 24times (and often much higher) time performance advantage over the fastest software Apriori implementations

Auto-pipe and the X language: A pipeline design tool and description language

Conference Paper

May 2006

Auto-Pipe is a tool that aids in the design, evaluation and implementation of applications that can be executed on computational pipelines (and other topologies) using a set of heterogeneous devices including multiple processors and FPGAs. It has been developed to meet the needs arising in the domains of communications, computation on large datasets, and real time streaming data applications. This paper introduces the Auto-Pipe design flow and the X design language, and presents sample applications. The applications include the Triple-DES encryption standard, a subset of the signal-processing pipeline for VERITAS, a high-energy gamma-ray astrophysics experiment. These applications are discussed and their description in X is presented. From X, simulations of alternative system designs and stage-to-device assignments are obtained and analyzed. The complete system permits production of executable code and bit maps that may be downloaded onto real devices. Future work required to complete the Auto-Pipe design tool is discussed.

Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs

Conference Paper

May 2005

The Apriori algorithm is a popular correlation-based data mining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be significantly reduced. Our array architecture implementation on a Xilinx Virtex-II Pro 100 provides a performance improvement that can be orders of magnitude faster than the state-of-the-art software implementations. The system is easily scalable and introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling.

Novel Techniques for Processing Unstructured Data Sets

Conference Paper

Apr 2005

While improvements in the density of semiconductor circuitry have been dramatic, the density improvements in magnetic storage have been even greater. We now store much more data than we have time to process, implying that techniques for processing these data need to be significantly altered. This paper describes a new architectural approach that enables the processing of very large data sets, yielding two orders of magnitude performance gain over conventional approaches

Prototype search engine.

Contexts in source publication

Citations