Figure 5 - uploaded by Ronald S. Indeck
Content may be subject to copyright.
Prototype search engine.

Prototype search engine.

Source publication
Conference Paper
Full-text available
Data mining is an application that is commonly ex- ecuted on massively parallel systems, often using clus- ters with hundreds of processors. With a disk-based data store, however, the data must first be delivered to the processors before effective mining can take place. Here, we describe the prototype of an experimental sys- tem that moves processi...

Contexts in source publication

Context 1
... FPX-based search engine is tapped into an off- the-shelf ATA hard drive as shown in Figure 5. When the hard drive is sending data to the host system, the FPX gets a copy at the same time through a custom printed circuit board (providing termination and volt- age conversion) and the test pins of the FPX board. ...
Context 2
... we compare the performance of the software and hardware versions of the search engine. Essentially, we will be using measurements on the ex- perimental system of Figure 5 to gain insight into the performance of of the target system of Figure 2. Figure 7 shows a simplified time line for reading a single hard disk. When the disk controller receives the read command from the host, and the data to be re- trieved are not in the controller cache, it instructs the magnetic head to move to the right data track (a seek) and waits for a rotational latency for the data sector to rotate under the head. ...

Citations

... feature extraction [2], [3]), and string matching (e.g., in a context of network intrusion detection [4]- [7], deoxyribonucleic acid sequence matching [8], [9], database searching [10], and network packet routing [11], [12]). The common feature in these tasks is that they can be efficiently parallelized, and that the same basic operation is performed numerous times using one set of fixed data known in advance (which are allowed to change infrequently), such as a filter template in image processing or keyword in string matching, along with streaming input data. ...
... Note that using R pass = (R pass ) max in (9) assumes that area for the minimum-size transistor is 25F 2 CMOS . For the dynamic logic, the delay is estimated as τ ≈ 2(2MC wire + C gate )R ON (10) where 2 MC wire is capacitance of two nanowire segments, while C gate is a total capacitance of CMOS circuitry at the input of DFF, including its gate capacitance and drain capacitances of the configuration and pull-down pass gates. The additional factor of 2 is to account for both precharging and evaluation phases, which is rather conservative assumption given that precharging currents are not limited by R ON value. ...
Article
Full-text available
In this paper, we propose a novel CMOS+MOLecular (CMOL) field-programmable gate array (FPGA) circuit architecture to perform massively parallel, high-throughput computations, which is especially useful for pattern matching tasks and multidimensional associative searches. In the new architecture, patterns are stored as resistive states of emerging nonvolatile memory nanodevices, while the analyzed data are streamed via CMOS subsystem. The main improvements over prior work offered by the proposed circuits are increased nanodevice utilization and, as a result, substantially higher throughput, which is demonstrated by a detailed analysis of the implementation of pattern matching task on the new architecture. For example, our estimates show that the proposed CMOL FPGA circuits based on the 22-nm CMOS technology and one crossbar layer with 22-nm nanowire half-pitch allows up to 12.5% average nanodevice utilization, i.e., the fraction of the devices turned to the high conductive state, as compared to a typical ~0.1% of the original CMOL FPGA circuits. This in turn enables throughput close to 7.1x10¹⁶ bits/s/cm² at ~1 fJ/bit energy efficiency, for matching of ~10⁷ 250-bit patterns stored locally on a 1 cm² chip. These numbers are at least 2 orders of magnitude better throughput as compared to that of other state-of-the-art FPGA methods, and begin to approach ternary content-addressable memory -like performance at similar CMOS technology nodes. More generally, we argue that the proposed concept combines the versatility of reconfigurable architectures and density of the associative memories. It can be viewed as a very tight symbiotic integration of memory and logic functions for high-performance logic-in-memory computing.
... String matching is a class of fundamental issues in computer science, and among them, approximate string matching plays an irreplaceable role. Approximate string matching is studied extensively in the areas of bioinformatics [6,20,22], data mining [11,23], pattern recognition [3,4], information retrieval [7,16,24], etc. So the study of effective approximate string matching algorithms has great significance in both theory and practice. ...
Article
Full-text available
Approximate string matching over suffix tree with depth-first search (ASM_ST_DFS), a classical algorithm in the field of approximate string matching, was originally proposed by Ricardo A. Baeza-Yates and Gaston H. Gonnet in 1990. The algorithm is one of the most excellent algorithms for approximate string matching if combined with other indexing techniques. However, its time complexity is sensitive to the length of pattern string because it searches (Formula presented.) characters on each path from the root before backtracking. In this paper, we propose an efficient pruning strategy to solve this problem. We prove its correctness and efficiency in theory. Particularly, we proved that if the pruning strategy is adopted, it averagely searches O(k) characters on each path before backtracking instead of O(m). Considering each internal node of suffix tree has multiple branches, the pruning strategy should work very well. We also experimentally show that when k is much smaller than m, the efficiency improves hundreds of times, and when k is not much smaller than m, it is still several times faster. This is the first paper that tries to solve the backtracking problem of ASM_ST_DFS in both theory and practice.
... String matching algorithms [1,2] becomes necessary tool for most of the application like as Intrusion Detection System [4,5], Plagiarism Detection [6,7], Text Mining [8,9], Bio Informatics [10,11]. In all we have to find the pattern from the database. ...
... Hardware has been investigated in applications such as data mining that require full text searching. For example, the Mercury System [4] is a prototype data mining engine that uses a shift and add [5,6] algorithm for exact string matching extended for handling of mismatches. Hardware based text search or string matching has also been used by the FPGA community particularly with respect to network intrusion detection. ...
Article
String matching is a key problem in many network processing applications. Current implementations of this process using software are time consuming and cannot meet gigabit bandwidth requirements. Implementing this process in hardware improves the search time considerably and has several other advantages. This paper presents an array based hardware implementation of this time consuming process for network intrusion detection and directory lookup applications using reconfigurable hardware. These designs are coded in VHDL targeting a Xilinx Virtex-II Pro FPGA and are evaluated in terms of the speed and resource utilization.
... Here, data files are read from the disk subsystem and delivered to the FPGA for searching. The approximate match engine [8] is capable of a maximum throughput of 875 MB/s (limited by the FPGA's I/O capability and measured using the synthetically generated data set). The regular expression engine [9] is capable of a maximum throughput of 650 MB/s (limited by the internal processing capability of the FPGA and measured using the synthetically generated data set). ...
Conference Paper
Direct-attached storage has historically had the reputation of being less capable than equivalently sized SAN installations. Here, we empirically demonstrate the performance achievable in multiple-terabyte, direct-attached disk subsystems. A number of parameters are explored, including file system, number of logical drives, and RAID configuration.
... The application set that is well matched to the Mercury system architecture is a pipeline that consumes a high data volume at its input, reduces that data volume to a smaller set, and performs higher-level processing on this smaller set. Our previous work has illustrated the use of the system for a number of text search applications [6,7,8,14,36,38]. BLASTN has properties that fit well with the Mercury system's capabilities. ...
... WhileFigure 1 illustrates our vision of the system architecture, our prototyping work has so far been limited to a series of implementations that are progressively closer to, but do not yet exactly match, the architecture depicted in the figure. Our earliest prototypes used ATA drives [36,38] and were severely speed-limited by the disks. Our most recent prototypes are built using a set of 15,000 rpm Ultra320 SCSI drives organized in a RAID-0 configuration. ...
Article
Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.
... Currently, our initial paper ([5]) on a systolic array architecture for data mining is the only recent work in this area. However, research in hardware implementations of related data mining algorithms has been published [9, 13, 16]. In [9] and [13] the k-means clustering algorithm is implemented as an example of a special reconfigurable fabric in the form of a cellular array connected to a host processor . ...
... However, k-means adds the distance computation and significantly changes how the sets are built up. In [16] a system is implemented which attempts to mediate the high cost of data transfers for large data sets. Common databases can easily extend beyond the capacity of the physical memory, and slow tertiary storage, e.g., hard drives, are brought into the datapath. ...
Conference Paper
The Apriori algorithm is a fundamental correlation-based data mining kernel used in a variety of fields. The innovation in this paper is a highly parallel custom architecture implemented on a reconfigurable computing system. Using this "bitmapped CAM," the time and area required for executing the subset operations fundamental to data mining can be significantly reduced. The bitmapped CAM architecture implementation on an FPGA-accelerated high performance workstation provides a performance acceleration of orders of magnitude over software-based systems. The bitmapped CAM utilizes redundancy within the candidate data to efficiently store and process many subset operations simultaneously. The efficiency of this operation allows 140 units to process about 2,240 subset operations simultaneously. Using industry-standard benchmarking databases, we have tested the bitmapped CAM architecture and shown the platform provides a minimum of 24times (and often much higher) time performance advantage over the fastest software Apriori implementations
... If the data is partitioned appropriately over a multiple RAID system, then multiple computational pipelines can operate in parallel across the data, providing for even higher performance. This general approach has been presented in [3, 10, 17, 23]. Scientific Data Collection. ...
Conference Paper
Auto-Pipe is a tool that aids in the design, evaluation and implementation of applications that can be executed on computational pipelines (and other topologies) using a set of heterogeneous devices including multiple processors and FPGAs. It has been developed to meet the needs arising in the domains of communications, computation on large datasets, and real time streaming data applications. This paper introduces the Auto-Pipe design flow and the X design language, and presents sample applications. The applications include the Triple-DES encryption standard, a subset of the signal-processing pipeline for VERITAS, a high-energy gamma-ray astrophysics experiment. These applications are discussed and their description in X is presented. From X, simulations of alternative system designs and stage-to-device assignments are obtained and analyzed. The complete system permits production of executable code and bit maps that may be downloaded onto real devices. Future work required to complete the Auto-Pipe design tool is discussed.
... As far as we know, the Apriori algorithm has not been studied in any significant way for efficient hardware implementation. However, research in hardware implementations of related datamining algorithms has been done [6] [12] [20] [21]. In [6] [20] the k-means clustering algorithm in implemented as an example of a special reconfigurable fabric in the form of a cellular array connected to a host processor. ...
... By avoiding global connections that violate the principles of systolic design, we can increase overall system clock frequency and ease routing problems. In [21] a system is implemented which attempts to mediate the high cost of data transfers for large data sets. Common databases can easily extend beyond the capacity of the physical memory, and slow tertiary storage, e.g., hard drives, are brought into the datapath. ...
... As far as we know, the Apriori algorithm has not been studied in any significant way for efficient hardware implementation . However, research in hardware implementations of related data mining algorithms has been done [6, 17, 18]. In [6] and [17] the k-means clustering algorithm is implemented as an example of a special reconfigurable fabric in the form of a cellular array connected to a host processor . ...
Conference Paper
The Apriori algorithm is a popular correlation-based data mining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be significantly reduced. Our array architecture implementation on a Xilinx Virtex-II Pro 100 provides a performance improvement that can be orders of magnitude faster than the state-of-the-art software implementations. The system is easily scalable and introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling.
... In our previous work, we have demonstrated high throughput I/O performance from the data store to the FPGA as well as a set of applications that includes exact text search, approximate text search, biosequence similarity search, etc. [1,2,3,4,5] Measured performance gains for these applications range from one to two orders of magnitude over state-of-the-art commodity processors. In this paper, we will compare and contrast our techniques with commonly used approaches for extracting information from large, unstructured data stores. ...
Conference Paper
While improvements in the density of semiconductor circuitry have been dramatic, the density improvements in magnetic storage have been even greater. We now store much more data than we have time to process, implying that techniques for processing these data need to be significantly altered. This paper describes a new architectural approach that enables the processing of very large data sets, yielding two orders of magnitude performance gain over conventional approaches