IP lookup table represented as a binary trie. Stored prefixes are denoted by shaded nodes. Next hops are found by traversing the trie.

Source publication

Scalable IP Lookup for Programmable Routers

Article

Full-text available

Apr 2002

Continuing growth in optical link speeds places increasing demands on the performance of Internet routers, while deployment of embedded and distributed network services imposes new demands for flexibility and programmability. IP address lookup has become a significant performance bottleneck for the highest performance routers. Amid the vast array o...

A VLSI network of spiking neurons with an asynchronous static random access memory

Conference Paper

Full-text available

Nov 2011

In this paper we present an asynchronous VLSI neuromorphic architecture comprising an array of integrate and fire neurons and dynamic synapse circuits with programmable weights. To store synaptic weight values, we designed a novel asynchronous SRAM block, integrated it on chip and connected it to the dynamic synapse circuits, via a fast current-mod...

Impact of Carbon Doping on Polysilicon Grain Size Distribution and Yield Enhancement for 40 nm Embedded Non-Volatile Memory Technology

Article

Full-text available

Jan 2018

Polysilicon (poly-Si) grain size control is a critical issue with scaling of MOS transistors in integrated circuit design, more so in embedded non-volatile memory (NVM) technology. This paper investigates an approach to suppress poly-Si grain growth under necessary additional thermal budget for 40 nm embedded NVM technology. Our studies reveal that...

Figure 1. Schematic of a six-transistor (6-T) static random-access...

Figure 2. Flow of the proposed true random number generator (TRNG)...

Figure 3. Steps of proposed SRAM power-on characteristics improvement...

Figure 4. Min-entropy (H bit min ) calculated at each dose step.

Figure 5. P0 of SRAM cells in Chip #4 under different doses. The darker...

Improved Performance of SRAM-Based True Random Number Generator by Leveraging Irradiation Exposure

Article

Full-text available

Oct 2020

Encryption is an important step for secure data transmission, and a true random number generator (TRNG) is a key building block in many encryption algorithms. Static random-access memory (SRAM) chips can be easily available sources of true random numbers, benefiting from noisy SRAM cells whose start-up values flip between different power-on cycles....

Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems

Article

Full-text available

Jun 2011

Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by transferring data between SPM and off-chip memory. I...

Design of experiments and integer linear programming-assisted conjugate-gradient optimisation of high-/metal-gate nano-complementary metal-oxide semiconductor static random access memory

Article

Full-text available

Jul 2012

Low-power consumption and stability in static random access memories (SRAMs) is essential for embedded applications. This study presents a novel design flow for power minimisation of nano-complementary metal-oxide semiconductor SRAMs, while maintaining stability. A 32 nm high-k/metal-gate SRAM has been used as an example circuit. The baseline circu...

Helix: IP Lookup Scheme based on Helicoidal Properties of Binary Trees

Article

Full-text available

Jul 2015
COMPUT NETW

In this paper, we propose an IP lookup scheme, called Helix, that performs parallel prefix matching at the different prefix lengths and uses the helicoidal properties of binary trees to reduce tree height. The reduction of the tree height is achieved without performing any prefix modification. Helix minimizes the amount of memory used to store long and numerous prefixes and achieves IP lookup and route updates in a single memory access. We evaluated the performance of Helix in terms of the number of memory accesses and amount of memory required for storing large IPv4 and IPv6 routing tables with up to 512,104 IPv4 and 389,956 IPv6 prefixes, respectively. In all the tested routing tables, Helix performs lookup in a single memory access while using very small memory amounts. We also show that Helix can be implemented on a single field-programmable gate array (FPGA) chip with on-chip memory for the IPv4 and IPv6 tables considered herein, without requiring external memory. Specifically, Helix uses up to 72% of the resources of an FPGA to accommodate the most demanding routing table, without performance penalties. The implementation shows that Helix may achieve lookup speeds beyond 1.2 billion packets per second (Gpps).

FPGA implementation of lookup algorithms

Conference Paper

Jul 2011

The pool of available IPv4 addresses is being depleted, comprising less than 10% of all IPv4 addresses. At the same time, the bit-rates at which packets are transmitted are increasing, and the IP lookup speed must be increased as well. Consequently, the IP lookup algorithms are in the research focus again because the existing solutions were designed for IPv4 addresses, and are not sufficiently scalable. In this paper, we compare FPGA implementations of the balanced parallelized frugal lookup (BPFL) algorithm, and the parallel optimized linear pipeline (POLP) lookup algorithm that efficiently use the memory, and achieve the highest speeds. I. INTRODUCTION Internet is still fast growing network. The number of hosts is still increasing and the IPv4 address space is almost exhausted. Actually, the Internet of "things" is being developed to include a tremendous number of sensors which might be attached to various machines and appliances. As a result of this development, transition to longer IPv6 addresses is inevitable. Packets generated by the increasing number of things on the Internet will be directed through the routers based on their IPv6 addresses. The output port (i.e. next-hop) of each packet is determined based on its IP address using the information from the lookup table according to the specified IP lookup algorithm. The lookup table contains forwarding information for the network addresses that a router learned from other routers in network. As the Internet is growing the lookup tables are getting larger. Classless network addresses are aggregated in these tables in order to consume a minimal amount of memory, and the longest prefix match of the given IP address should be found. The lookup table is typically split between the internal and the external memories. The internal (on-chip) memory is on the same chip as the lookup logic. The on-chip memory has a large throughput, which allows parallelization and pipelining that provide the high lookup speeds. However, the on-chip memory is very limited, and should be, therefore, carefully used. As the IP addresses get longer, the internal memory requirements of the lookup table can become a bottleneck. So, the existing internal memory should be used in a way that would maximize the lookup speed for the largest IP lookup tables. one node has 2 m children as m bits are used for determining the child in the next level, and this technique reduces the tree depth. The level compression technique replaces the parts of the binary tree that are populated above some threshold with the multibit subtrees to efficiently reduce the depth of the tree. The bitmap technique uses the compact binary presentation of some parts of the tree (a subtree structure is presented with the bitmap vector whose positions correspond to the nodes in the subtree). It is usually combined with the technique that reduces the number of pointers in a multibit tree. Namely, only one pointer is kept in a node and it points to the first element of the vector of pointers to the node's children. Leaf pushing is used to push the next-hop information from the internal nodes of the tree to the leafs of the tree. The priority-tree technique fills empty nodes in the early levels of a binary tree with the longer prefixes. In this way the total number of nodes is reduced. Hashing is a popular technique used to reduce the number of memory accesses and increase the lookup speed. As one can see, there are many techniques that can be used to achieve higher lookup speeds, but usually they come with a price. For example, some internal nodes with the next-hop information can be masked with the leaf pushing technique or the multibit trees, so updates get more complicated than in ordinary binary tree, etc.

A fast IP routing lookup architecture for multi-gigabit switching routers based on reconfigurable systems

Article

Jun 2008
MICROPROCESS MICROSY

With today’s networks complexity, routers in backbone links must be able to handle millions of packets per second on each of their ports. Determining the corresponding output interface for each incoming packet based on its destination address requires a longest matching prefix search on the IP address. Therefore, IP address lookup is one of the most challenging problems for backbone routers. In this paper, an IP routing lookup architecture is proposed which is based on a reconfigurable hardware platform. Experimental results show that the rate of 193 million lookups per second is achieved using our architecture while prefixes can be updated with a rate of 3 million updates per second. Furthermore, it was shown that using our reconfigurable architecture results in rare update failure rate due to resource limitations.

Beyond TCAMs: An SRAM-based parallel multi-pipeline architecture for terabit IP lookup

Conference Paper

Full-text available

May 2008

Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM- based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput. However, several challenges must be addressed for such solutions to realize high throughput. First, the memory distribution across different stages of each pipeline as well as across different pipelines must be balanced. Second, the traffic on various pipelines should be balanced. In this paper, we propose a parallel SRAM-based multi- pipeline architecture for terabit IP lookup. To balance the memory requirement over the stages, a two-level mapping scheme is presented. By trie partitioning and subtrie-to-pipeline mapping, we ensure that each pipeline contains approximately equal number of trie nodes. Then, within each pipeline, a fine-grained node-to-stage mapping is used to achieve evenly distributed memory across the stages. To balance the traffic on different pipelines, both pipelined prefix caching and dynamic subtrie-to-pipeline remapping are employed. Simulation using real-life data shows that the proposed architecture with 8 pipelines can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory. It achieves a throughput of up to 3.2 billion packets per second, i.e. 1 Tbps for minimum size (40 bytes) packets.

Efficient Evaluation of Multiple-Output Boolean Functions in Embedded Software or Firmware

Article

Full-text available

Nov 2007

Dvorak Vaclav

The paper addresses software and firmware implementation of multiple-output Boolean functions based on cascades of Look-Up Tables (LUTs). A LUT cascade is described as a means of compact representation of a large class of sparse Boolean functions, evaluation of which then reduces to multiple indirect memory accesses. The method is compared to a technique of direct PLA emulation and is illustrated on examples. A specialized micro-engine is proposed for even faster evaluation than is possible with universal microprocessors. The presented method is flexible in making trade-offs between performance and memory footprint and may be useful for embedded applications where the processing speed is not critical. Evaluation may run on various CPUs and DSP cores or slightly faster on FPGA-based micro-programmed controllers.

Towards IQ-Appliances: Quality-awareness in Information Virtualization

Conference Paper

Full-text available

Aug 2007

Our research addresses "information appliances' used in modern large-scale distributed systems to: (1) virtualize their data flows by applying actions such as filtering, format translation, etc., and (2) separate such actions from enterprise applications' business logic, to make it easier for future service-oriented codes to inter-operate in diverse and dynamic environments. Our specific contribution is the enrichment of runtimes of these appliances with methods for QoS-awareness, thereby giving them the ability to deliver desired levels of QoS even under sudden requirement changes - IQ-appliances. For experimental evaluation, we prototype an IQ-appliance. Measurements demonstrate the feasibility and utility of the approach.

High-Speed Prefix-Preserving IP Address Anonymization for Passive Measurement Systems

Article

Mar 2007
IEEE ACM T NETWORK

Passive network measurement and packet header trace collection are vital tools for network operation and research. To protect a user's privacy, it is necessary to anonymize header fields, particularly IP addresses. To preserve the correlation between IP addresses, prefix-preserving anonymization has been proposed. The limitations of this approach for a high-performance measurement system are the need for complex cryptographic computations and potentially large amounts of memory. We propose a new prefix-preserving anonymization algorithm, top-hash subtree-replicated anonymization (TSA), that features three novel improvements: precomputation, replicated subtrees, and top hashing. TSA makes anonymization practical to be implemented on network processors or dedicated logic at Gigabit rates. The performance of TSA is compared with a conventional cryptography based prefix-preserving anonymization scheme which utilizes caching. TSA performs better as it requires no online cryptographic computation and a small number of memory lookups per packet. Our analytic comparison of the susceptibility to attacks between conventional anonymization and our approach shows that TSA performs better for small scale attacks and comparably for medium scale attacks. The processing cost for TSA is reduced by two orders of magnitude and the memory requirements are a few Megabytes. The ability to tune the memory requirements and security level makes TSA ideal for a broad range of network systems with different capabilities

An Efficient Distributed Algorithm to Identify and Traceback DDoS Traffic

Article

Jul 2006
COMPUT J

Distributed denial-of-service attack is one of the most pressing security problems that the Internet community needs to address. Two major requirements for effective traceback are (i) to quickly and accurately locate potential attackers and (ii) to filter attack packets so that a host can resume the normal service to legitimate clients. Most of the existing IP traceback techniques focus on tracking the location of attackers after-the-fact. In this work, we provide an efficient methodology for locating potential attackers who employ the flood-based attack. We propose a distributed algorithm so that a set of routers can correctly (in a distributed sense) gather statistics in a coordinated fashion and that a victim site can deduce the local traffic intensities of all these participating routers. We prove the correctness of our distributed algorithm, and given the collected statistics, we provide a method for the victim site to locate attackers who sent out dominating flows of packets. The proposed distributed traceback methodology can also complement and leverage on the existing ICMP traceback so that a more efficient and accurate traceback can be obtained. We carry out simulations to illustrate that the proposed methodology can locate the attackers in a short period of time. Moreover, the applications as well as the limitations of the proposed methodology are covered. We believe this work also provides the theoretical foundation on how to correctly and accurately perform distributed measurement and traffic estimation on the Internet.

High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Conference Paper

Mar 2006

IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose a high-performance IPv6 forwarding algorithm TrieC, and implement it efficiently on the Intel IXP2800 network processor (NPU). Programming the multi-core and multithreaded NPU is a daunting task. We study the interaction between the parallel algorithm design and the architecture mapping to facilitate efficient algorithm implementation. We experiment with an architecture-aware design principle to guarantee the high performance of the resulting algorithm. This paper investigates the main software design issues that have dramatic performance impacts on any NPU based implementation: memory space reduction, instruction selection, data allocation, task partitioning, latency hiding, and thread synchronization. In the paper, we provide insight on how to design an NPU-aware algorithm for high-performance networking applications. Based on the detailed performance analysis of the TrieC algorithm, we provide guidance on developing high-performance networking applications for the multi-core and multithreaded architecture.

TrieC: A High-Speed IPv6 Lookup with Fast Updates Using Network Processor

Conference Paper

Dec 2005
Lect Notes Comput Sci

Address lookup is one of the main bottlenecks in Internet backbone routers, as it requires the router to perform a longest-prefix-match when searching the routing table for a next hop. Ever-increasing Internet bandwidth, continuously growing prefix table size and inevitable migration to IPv6 address architecture further exacerbate this situation. In recent years, a variety of high- speed address lookup algorithms have been proposed, however most of them are inappropriate to IPv6 lookup. This paper proposes a high-speed IPv6 lookup algorithm TrieC, which achieves the goals of high-speed address lookup, fast incremental prefix updates, high scalability and reasonable memory requirement by taking great advantage of the network processor architecture. Performance of TrieC is carefully evaluated with several IPv6 routing tables of different sizes and different prefix length distributions on Intel IXP2800 network processor(NPU). Simulation shows that TrieC can support IPv6 lookup at OC-192 line rate. Furthermore, if TrieC is pipelined in hardware, it can achieve one IPv6 lookup per memory access.

IP lookup table represented as a binary trie. Stored prefixes are denoted by shaded nodes. Next hops are found by traversing the trie.

Similar publications

Citations