Conference PaperPDF Available

Network-on-chip: Current issues and challenges

June 2015

June 2015

DOI:10.1109/ISVDAT.2015.7208160

Conference: 2015 19th International Symposium on VLSI Design and Test (VDAT)

Authors:

Manoj Singh Gaur

Vijay Laxmi

Malaviya National Institute of Technology Jaipur

Mark Zwolinski

University of Southampton

Manoj Kumar

Tata Motors

Show all 6 authorsHide

Content uploaded by Ashish Sharma

Content may be subject to copyright.

Network-on-chip: Current Issues and Challenges

Manoj Singh Gaur

Professor at Computer

Engineering,

Malaviya National

Institute of Technology

MNIT, Jaipur, India

Email:

gaurms@gmail.com

Manoj Kumar

Malaviya National

Institute of Technology

MNIT, Jaipur, India

Vijay Laxmi

Associate Professor at

Computer Engineering,

Malaviya National

Institute of Technology

MNIT, Jaipur, India

Email:

vlaxmi@mnit.ac.in

Niyati Gupta

Malaviya National

Institute of Technology

MNIT, Jaipur, India

Mark Zwolinski

Professor in

the Electronic Systems

Design Group University

of Southampton,

High field, Southampton

SO17 1BJ

Email:

mz@ecs.soton.ac.uk

Ashish

Malaviya National

Institute of Technology

MNIT, Jaipur, India

ABSTRACT

Due to the shrinking transistor sizes, the density of ICs roughly

doubles every year as predicted by Moore’s law. These advancements in the

VLSI integration densities towards the nano scale era, witnessed a paradigm

shift from computation centric designs to communication centric designs

incorporating very large number of simple cores. Plenty of traditional

interconnect schemes like point to point, buses and crossbars are available

to interconnect small number of cores. While achieving fast and efficient

communication with point to point communication schemes, wire density is

a barrier for adapting them to many core architectures. Moreover, buses are

simpler in design, they suffer from the scalability and arbitration issues

along with bandwidth bottleneck as the number of cores increases. Similarly

area and power requirements of a crossbar limits its applicability. Hence, in

many core architectures like Chip Multiprocessors (CMP) and Multi

processor System-on-Chip (MPSoCs), emerge the need of an efficient

communication infrastructure as traditional solutions fails to handle the

communication challenges.

Network-on-Chip (NoC), a scalable and modular design approach, has

been proposed as a promising alternative to traditional bus based

architectures for inter-core communication. NoC has also been accepted in

industy (Tilera’s TILE-Gx72,TILE64TM [1] processors and Intel’s terascale

processor [2]. NoCs are an attractive alternative for the traditional shared-

buses or dedicated wires due to many reasons. First, NoCs represent a

scalable solution to on-chip communication paradigm, because they provide

scalable bandwidth at low power and area overheads. Second, NoCs are very

efficient in terms of use of wiring and multiplexing many traffic flows on the

same channels providing quality of service and higher bandwidth. Finally,

on-chip networks with regular topologies have short interconnects that can

be optimized and reused using regular iterative blocks, thus making the

verification process easy. For on-chip networks, two-dimensional (2D) mesh

is the most preferred topology choice due to its regularity, scalability, and

perfect physical layout on an actual chip. This tutorial shall focus on NoC

routing algorithms, their implementations and issues. The main parameters

of the network which are affected by the routing algorithm include fault-

tolerance, quality of service, communication performance (throughput and

latency) and power consumption. The following are the main objective of this

tutorial:

• Introduction to NoC [3]: In this part, we briefly discuss about various

design parameters of NoC such as topology, switching, flow control,

routing and comparison with existing mechanisms.

• Routing Taxonomy [4]: In this part, we present classification of various

routing algorithms.

• Deadlock and Livelock freedom in Routing: One of current issue in

NoC routing is the use of acyclic channel dependency graph (ACDG)

for deadlock freedom prohibiting certain routing turns. Thus, ACDG

reduces the degree of adaptiveness. In this section, we discuss various

turn models [5] and how these turn model can be improved to

increase adaptivity while maintaining deadlock freedom.

• Routing Implementations for NoC: Denser integration advancements

make the chip more prone to failures (deep sub-micron effects,

manufacturing effects etc). Furthermore these failures may disrupt

the regularity of 2D meshes, leading to an irregular set of topologies

generated from regular 2D meshes. Under this condition, solutions of

regular 2D meshes may no longer work due to irregular topology. In

this section, we discuss state-of-art routing implementation

techniques [6]–[8] used for irregular 2D mesh under different failures.

• Learning methods to handle congestion in Routing: Reinforcement

Learning (RL) is a machine learning paradigm that has been widely

applied in many areas. The Q-Learning has been used in NOC to learn

the network traffic and make the routing decisions accordingly. At

each node, a table is used to store the values that represent the

congestion level of each link and these values are updated after every

packet transfer. Although, Q-Learning has improved network

performance but there are many challenges which we would discuss

in this section

• Brief hands on tool chain for NoC simulation shall also provide

towards the end.

REFERENCES

[1] S. Bell et al., “Tile64 - processor: A 64-core soc with mesh interconnect,”

in Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of

Technical Papers. IEEE International, Feb 2008, pp. 88–598.

[2] S. Vangal et al., “An 80-tile sub-100-w teraflops processor in 65-nm

cmos,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 1, pp.

29–41, Jan 2008.

[3] M. Palesi and M. Daneshtalab, Routing Algorithms in Networks-on-Chip.

Springer Publishing Company, Incorporated, 2013.

[4] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks - An

Engineering Approach. Morgan Kaufmann, 2003.

[5] M. Kumar, V. Laxmi, M. Gaur, M. Daneshtalab, and M. Zwolinski, “A

novel non-minimal turn model for highly adaptive routing in 2d nocs,” in

Very Large Scale Integration (VLSI-SoC), 2014 22nd International

Conference on, Oct 2014, pp. 1–6.

[6] R. Bishnoi, V. Laxmi, M. Gaur, R. Bin Ramlee, and M. Zwolinski, “Ceri:

Cost-effective routing implementation technique for networkon- chip,” in

VLSI Design (VLSID), 2015 28th International Conference on, Jan 2015,

pp. 59–64.

[7] J. Flich and J. Duato, “Logic-based distributed Routing for nocs,”

Computer Architecture Letters, vol. 7, no. 1, pp. 13–16, 2008.

[8] S. Rodrigo et al., “Cost-efficient On-Chip Routing Implementations for

CMP and MPSoC Systems,” Computer-Aided Design of Integrated

Circuits and Systems, IEEE Transactions on, vol. 30, no. 4, pp. 534–

547, 2011.

MRBS: An Area-Efficient Multicast Router for Network-on-Chip using Buffer Sharing

Article

Full-text available

Dec 2021

Network-on-chip (NoC) has become the mainstream fabric architecture for chip multiprocessor (CMP) design. Owing to the market-driven advancement of modern applications in CMP, multicast traffic is aggressively increasing to support barrier synchronization, multithreading, and cache coherence protocols. Although multicast by branching of packets in the NoC router facilitates shortest path routing, additional branching-induced deadlocks must be circumvented. Existing NoC studies on deadlock-free minimal path routing in multicast traffic have typically deployed additional virtual channels or large buffers to hold entire packets, thereby significantly increasing the router area. Focusing on the area-efficient solution while sustaining the performance, we propose a novel multicast router using buffer sharing (MRBS) to guarantee deadlock-free multicast routing by exploiting the spatial diversity of the input buffer. MRBS ensures minimal path routing without requiring additional virtual channels or large buffers to hold entire packets. Extensive experiments were conducted by varying the buffer, packet, and network sizes, as well as the number of destinations per packet, under random multicast traffic with diverse injection rates. Simulation results show that MRBS achieves a 39.3 % improvement in the area-delay product on average for various network sizes compared to the conventional tree-based router.

A survey on mapping and scheduling techniques for 3D Network-on-chip

Article

Feb 2024
J SYST ARCHITECT

2D And 3D Based Network On Chip For A Stream Of Data using Label Switching Technique

Article

Full-text available

Oct 2019

Universal interconnection networks are prime performance tailback for high performance SoCs (Systems-on-Chip). Since shrinking the size of the ICs (Integrated Circuits) is the main aim, NoC (Network-on-Chip), being a segmental and mountable design tactic is a propitious substitute to outmoded bus-mode architectures. NoC combined with 3D-Routers and label switching technique can guarantee low power consumption, QoS along with less latency. In the proposed work, 3D NoCs are proven to be more advantageous by achieving 39.9% reduction in Area, 1.7% reduction in Power Consumption, and 11.3% reduction in Memory usage.

Hardware Implementation of Network Interface Architecture for RISC-V based NoC-MPSoC Framework

Conference Paper

Feb 2022

MIMO-OFDM LTE System based on a parallel IFFT/FFT on a multiprocessor platform

Conference Paper

May 2022

Parallel IFFT/FFT for MIMO-OFDM LTE on NoC-Based FPGA

Chapter

Mar 2022

The evaluation of wireless communication systems over the last decades has led to a growing demand for more advanced high-speed communication systems. In this paper, we propose a hardware workflow developed for implementing the Long Term Evolution (LTE) communication system. This work studies the Multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) LTE system. The main focus of this work is on implementation of OFDM modulation/demodulation functions as they are the main contributors to the processing time and latency in high speed communication systems. To achieve this goal, a multicore low latency OFDM LTE system is proposed. The multicore RTL code is generated using the ProNoC tool. The main contribution of this system the archived speed up in OFDM LTE computation using parallel processing techniques on an NoC based multicore system. The speed-up comparison for systems having different numbers of cores computing the IFFT task are reported in this paper. The proposed multicore system is also compared with a single-core system as a reference design. Systems having different LTE OFDM configurations are synthesized, implemented and verified using Altera Stratix V GX FPGA. The application execution time and FPGA resource utilization are used as compassion metrics. The proposed multicore LTE OFDM systems having 2 and 16 processing tiles computing IFFT tasks on different LTE channel bandwidths, the execution time is reduced by 24% and 76%, respectively compared to a conventional LTE OFDM system that is running on a single-core system.

MIMO-OFDM LTE system based on a parallel IFFT/FFT on NoC-based FPGA

Article

Jan 2022

The growing demand for wireless devices capable of performing complex communication processes has imposed an urgent need for high-speed communication systems and advanced network processors. This paper proposes a hardware workflow developed for the Long-Term Evolution (LTE) communication system. It studies the multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) LTE system. Specifically, this work focuses on the implementation of the OFDM block that dominates the execution time in high-speed communication systems. To achieve this goal, we have proposed an NoC-based low-latency OFDM LTE multicore system that leverages Inverse Fast Fourier Transform (IFFT) parallel computation on a variable number of processing cores. The proposed multicore system is implemented on an FPGA platform using the ProNoC tool, an automated rapid prototyping platform. Our obtained results show that LTE OFDM execution time is drastically reduced by increasing the number of processing cores. Nevertheless, the NoC’s parameters, such as routing algorithm and topology, have a negligible influence on the overall execution time. The implementation results show up to 24% and 76% execution time reduction for a system having 2 and 16 processing cores compared to conventional LTE OFDM implemented in a single-core, respectively. We have found that a 4×4 Mesh NoC with XY deterministic routing connected to 16 processing tiles computing IFFT task is the most efficient configuration for computing LTE OFDM. This configuration is 4.12 times faster than a conventional system running on a single-core processor.

Thực thi và đánh giá mạng trên chip sử dụng công cụ Synopsys

Article

Full-text available

Aug 2021

Pham Van Khoa

Mạng trên chip (network on chip - NoC) được xem là giải pháp hiệu quả trong hệ thống đa lõi thay thế cho các kiến trúc bus truyền thống. Trong bài báo này, hoạt động của một hệ thống trên chip ứng dụng khái niệm mạng trên chip được minh họa một cách hoàn chỉnh. Kiến trúc bộ định tuyến sử dụng cơ chế chuyển mạch gói, các giao diện giao tiếp mạng cũng như các thành phần lõi được thiết kế và thực thi sử dụng nền tảng phần cứng FPGA. Thêm vào đó, một giao diện đồ họa giao tiếp với người dùng được cung cấp nhằm để giám sát tình trạng hoạt động của mạng từ bên ngoài. Các kết quả về mặt thời gian, và công suất tiêu thụ của thiết kế được tổng hợp và phân tích với công cụ Design Compiler và công nghệ CMOS 90nm.

A Study of Network-on-Chip Performance

Conference Paper

Aug 2021

Vijaya Bhaskar

An Adaptive Hybrid with Residue Monitor for Full-Duplex On-Chip Interconnects

Conference Paper

Oct 2020

A novel non-minimal turn model for highly adaptive routing in 2D NoCs

Article

Full-text available

Jan 2015

Network-on-Chip (NoC) is emerging as a promising communication paradigm to overcome bottleneck of traditional bus-based interconnects for future micro-architectures (MPSoC and CMP). One of current issue in NoC routing is the use of acyclic channel dependency graph (ACDG) for deadlock freedom prohibiting certain routing turns. Thus, ACDG reduces the degree of adaptiveness. In this paper, we propose a novel nonminimal turn model which allows cycles in channel dependency graph provided that extended channel dependency graph is acyclic. Proposed turn model reduces number of restrictions on routing turns (specially on 90-degree), hence able to provide additional minimal and non-minimal routes between source and destination. We also propose a non-minimal and congestion-aware adaptive routing algorithm based on proposed turn model to demonstrate advantages. From results, we can observe that proposed method improves the network performance by distributing the traffic load in the non-congested regions.

Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems

Article

Full-text available

May 2011

The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area, and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism, or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge. This paper presents universal logic-based distributed routing (uLBDR), an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of using routing tables. uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the tradeoff between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the tradeoff between fault tolerance and performance. Power consumption, area, and delay estimates are also provided highlighting the efficiency of the mechanism. To do this, different router models (one for CMPs and one for MPSoCs) have been designed as a proof concept.

TILE64™ processor: A 64-core SoC with mesh interconnect

Conference Paper

Full-text available

Mar 2008

The TILE64<sup>TM</sup> processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

An 80-tile sub-100-w teraflops processor in 65-nm CMOS

Article

Full-text available

Feb 2008

This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm<sup>2</sup> custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS

Article

Jan 2008

S.R. Vangal

Routing Algorithms in Networks-on-Chip

Book

Apr 2014

This book provides a single-source reference to routing algorithms for Networks-on-Chip (NoCs), as well as in-depth discussions of advanced solutions applied to current and next generation, many core NoC-based Systems-on-Chip (SoCs). After a basic introduction to the NoC design paradigm and architectures, routing algorithms for NoC architectures are presented and discussed at all abstraction levels, from the algorithmic level to actual implementation. Coverage emphasizes the role played by the routing algorithm and is organized around key problems affecting current and next generation, many-core SoCs. A selection of routing algorithms is included, specifically designed to address key issues faced by designers in the ultra-deep sub-micron (UDSM) era, including performance improvement, power, energy, and thermal issues, fault tolerance and reliability. © Springer Science+Business Media New York 2014. All rights are reserved.

CERI: Cost-Effective Routing Implementation Technique for Network-on-Chip

Conference Paper

Feb 2015

To deal with the communication challenges of current and future many-core architectures, Network-on-Chip (NoC) has been proposed as a promising alternative. Regular 2D mesh topology is the most preferred design choice for NoCs. Hardware failures owing to manufacturing, wear-out, aging etc., however, may disrupt the regularity of 2D mesh. Sustaining routing under these circumstances becomes a challenge. Though traditional table based routing method is flexible enough to handle any irregularity, it is neither scalable nor cost-effective solution. Scalable distributed logic based solutions like uLBDR have limited flexibility and work only in restricted architectural space despite complex switch design. To overcome these limitations, this paper presents CERI (Cost-Effective Routing Implementation), an efficient logic based routing capable of handling failure-induced irregularities in 2D mesh. Implementation of proposed approach does not require tables or a complex switch design. Performance analysis of CERI demonstrates its cost effectiveness as area and power requirements are reduced respectively by (14%) and (16%) than previously proposed logic based solution uLBDR.

Interconnection networks: An engineering approach

Book

Jan 1997

Logic-Based Distributed Routing for NoCs

Article

Feb 2008

The design of scalable and reliable interconnection networks for multicore chips (NoCs) introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are usually proposed for NoCs, heterogeneous cores, manufacturing defects, hard failures, and chip virtualization may lead to irregular topologies. In this context, efficient routing becomes a challenge. Although switches can be easily configured to support most routing algorithms and topologies by using routing tables, this solution does not scale in terms of latency and area. We propose a new circuit that removes the need for using routing tables. The new mechanism, referred to as logic-based distributed routing (LBDR), enables the implementation in NoCs of many routing algorithms for most of the practical topologies we might find in the near future in a multicore chip. From an initial topology and routing algorithm, a set of three bits per switch output port is computed. By using a small logic block, LHDR mimics (demonstrated by evaluation) the behavior of routing algorithms implemented with routing tables. This result is achieved both in regular and irregular topologies. Therefore, LBDR removes the need for using routing tables for distributed routing, thus enabling flexible, fast and power-efficient routing in NoCs.

Tile64 -processor: A 64-core soc with mesh interconnect

Feb 2008
88-598

S Bell

S. Bell et al., "Tile64 -processor: A 64-core soc with mesh interconnect," in Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, Feb 2008, pp. 88-598.

Network-on-chip: Current issues and challenges

Recommended publications

On network-on-chip comparison

σLBDR: Congestion-aware logic based distributed routing for 2D NoC

Reconfigurable distributed fault tolerant routing algorithm for on-chip networks

σ n LBDR: generic congestion handling routing implementation for two-dimensional mesh network-on-chi...

Sigma-n LBDR: Generic Congestion Handling Routing Implementation for 2D mesh NoC