ArticlePDF Available

BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

August 2023
ICTACT Journal on Microelectronics 9(2):1580-1584

August 2023
9(2):1580-1584

DOI:10.21917/ijme.2023.0274

Authors:

Pradeep Mullangi

Shri Vishnu Engineering College for Women

Udutha Rajender

Vaageswari Colleges

Pravin Adivarekar

University of Mumbai

Sumit Kumar Gupta

St.Wilfre's PG College

As Very Large Scale Integration (VLSI) technology advances, the need to efficiently balance cost and performance in VLSI systems becomes paramount. To address this challenge, we propose a novel approach that leverages the RMSPROP algorithm for assisted design space exploration. The RMSPROP algorithm, which has proven effective in the field of deep learning optimization, is adapted to navigate the complex design space of VLSI systems. By integrating RMSPROP into the design space exploration process, we can intelligently search for optimal trade-offs between cost and performance, leading to highly efficient VLSI designs. Our experimental results demonstrate the effectiveness of the RMSPROP algorithm-assisted design space exploration, showcasing significant improvements in cost-performance trade-offs compared to traditional design methodologies. This research opens new avenues for designing VLSI systems with improved efficiency, enabling the realization of high-performance yet cost-effective integrated circuits.

Content uploaded by Udutha Rajender

Content may be subject to copyright.

Content uploaded by Udutha Rajender

Content may be subject to copyright.

Content uploaded by Pradeep Mullangi

Content may be subject to copyright.

M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

DOI: 10.21917/ijme.2023.0274

1580

BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP

ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

M. Pradeep1, Udutha Rajender2, Pravin Prakash Adivarekar3 and Sumit Kumar Gupta4

1Department of Electronics and Communication Engineering, Shri Vishnu Engineering College for Women, India

2Department of Electronics and Communication Engineering, Vaageswari College of Engineering, India

3Department of Computer Engineering, A.P.Shah Institute of Technology, India

4Department of Physics, St. Wilfred’s PG College, India

Abstract

As Very Large Scale Integration (VLSI) technology advances, the need

to efficiently balance cost and performance in VLSI systems becomes

paramount. To address this challenge, we propose a novel approach

that leverages the RMSPROP algorithm for assisted design space

exploration. The RMSPROP algorithm, which has proven effective in

the field of deep learning optimization, is adapted to navigate the

complex design space of VLSI systems. By integrating RMSPROP into

the design space exploration process, we can intelligently search for

optimal trade-offs between cost and performance, leading to highly

efficient VLSI designs. Our experimental results demonstrate the

effectiveness of the RMSPROP algorithm-assisted design space

exploration, showcasing significant improvements in cost-performance

trade-offs compared to traditional design methodologies. This research

opens new avenues for designing VLSI systems with improved

efficiency, enabling the realization of high-performance yet cost-

effective integrated circuits.

Keywords:

VLSI Systems, RMSPROP Algorithm, Design Space Exploration, Cost-

Performance Trade-Offs

1. INTRODUCTION

In recent years, the demand for high-performance and cost-

effective Very Large Scale Integration (VLSI) systems has

surged, driven by the rapid growth of modern technologies such

as artificial intelligence, Internet of Things (IoT), and cloud

computing. VLSI systems, comprising intricate integrated circuits

(ICs), form the backbone of numerous electronic devices and play

a pivotal role in shaping technological advancements [1].

However, designing VLSI systems that strike an optimal balance

between performance and cost remains a significant challenge,

given the increasing complexity and scale of these systems [2].

Traditional VLSI design methodologies often rely on manual

exploration of the design space to identify suitable trade-offs

between performance metrics, such as speed and power

consumption, and the cost of fabrication. As the design space

becomes increasingly vast and intricate, manual exploration

becomes impractical and time-consuming, hindering the

discovery of optimal solutions [3]. Consequently, there is a

pressing need for innovative approaches that can efficiently

navigate the design space, leading to the development of VLSI

systems that maximize performance while minimizing costs [4].

We propose a pioneering approach that harnesses the power

of the RMSPROP algorithm to facilitate design space exploration

for VLSI systems. The RMSPROP algorithm, initially designed

for optimizing deep learning models, demonstrates remarkable

efficiency in finding convergence paths while efficiently handling

the variance of gradients. By adapting RMSPROP to the VLSI

domain, we aim to address the challenge of balancing cost and

performance in VLSI systems.

The primary objective of this research is to present a

systematic and effective framework that combines the RMSPROP

algorithm with design space exploration techniques to efficiently

explore the vast solution space of VLSI systems. Through this

integration, we seek to uncover a range of design options that

offer favorable trade-offs between performance metrics and

fabrication costs, thus enabling the development of high-quality

VLSI systems with enhanced efficiency and cost-effectiveness.

2. RELATED WORKS

Various research efforts have been dedicated to exploring

design space exploration techniques for VLSI systems. These

studies often focus on optimization algorithms, evolutionary

strategies, and multi-objective optimization methods to efficiently

search for optimal design points in the vast solution space. While

these approaches have shown promising results, there is still a

need for novel methodologies that can handle the increasing

complexity and size of modern VLSI systems [5].

In the field of deep learning, optimization algorithms like

RMSPROP, Adam, and stochastic gradient descent (SGD) have

been extensively studied to train complex neural networks

efficiently. These algorithms address challenges such as

convergence speed, handling large-scale datasets, and alleviating

the problem of vanishing or exploding gradients. Drawing

inspiration from the success of these algorithms, researchers have

started exploring their adaptability to other domains, including

VLSI design [6].

The trade-offs between cost and performance are critical

considerations in VLSI system design. Researchers have

examined various techniques to optimize power consumption,

chip area, clock frequency, and other performance metrics, while

still adhering to stringent cost constraints. These studies often

utilize analytical models, heuristics, or machine learning

approaches to find the best compromise between performance and

cost [7].

Machine learning techniques have been increasingly

employed in VLSI design to automate various tasks, such as

layout generation, optimization, and synthesis. Reinforcement

learning, genetic algorithms, and neural architecture search are

some of the machine learning-based methods used for exploring

the design space and improving the efficiency of VLSI [8].

Several research efforts have focused on developing

methodologies that streamline the VLSI design process, reduce

design time, and enhance overall productivity. These

methodologies often incorporate intelligent algorithms and

ISSN: 2395-1680 (ONLINE) ICTACT JOURNAL ON MICROELECTRONICS, JULY 2023, VOLUME: 09, ISSUE: 02

1581

automation techniques to achieve better performance with fewer

resources [9].

The hardware-software co-design aims to optimize the

interaction between hardware and software components in a

system. Researchers have explored co-design techniques that

allow better utilization of hardware resources while achieving

high-performance results, ultimately contributing to cost-efficient

VLSI systems [10].

Studies examining the most recent advancements in VLSI

systems and integrated circuits are essential for understanding the

current state of the art. These works often serve as benchmarks

for evaluating the efficacy of new design methodologies,

including the proposed RMSPROP algorithm-assisted design

space exploration approach [11].

By studying these related works, we can gain insights into the

existing challenges and solutions in VLSI design, paving the way

for our novel approach to balance cost and performance using the

RMSPROP algorithm for design space exploration.

3. PROPOSED COST-EFFECTIVE DESIGN

The proposed methodology aims to address the challenge of

balancing cost and performance in VLSI systems by leveraging

the RMSPROP algorithm for assisted design space exploration.

This innovative approach combines principles from the fields of

deep learning optimization and VLSI design to efficiently

navigate the complex design space and identify optimal trade-offs

between various performance metrics and fabrication costs. The

key components of the proposed methodology are as follows:

3.1 DESIGN SPACE EXPLORATION USING

RMSPROP

The heart of the methodology lies in the adaptation of the

RMSPROP algorithm, originally designed for deep learning

optimization, to the context of VLSI design. The RMSPROP

algorithm demonstrates remarkable efficiency in dealing with the

variance of gradients and efficiently converging to optimal

solutions. By utilizing RMSPROP for design space exploration,

we can effectively search the vast solution space of VLSI systems,

efficiently evaluating and adjusting design parameters.

Design space exploration using RMSPROP involves adapting

the RMSPROP algorithm, originally designed for deep learning

optimization, to navigate the design space of VLSI systems

efficiently. The goal is to iteratively update the design parameters

to find optimal trade-offs between various performance metrics

and fabrication costs. The core idea behind RMSPROP is to adjust

the learning rate for each parameter based on the historical

gradient information, which allows for faster convergence and

handling of gradient variance. The update of RMSPROP for a

parameter θ at iteration t is given as follows:

vt =



 vt-1 + (1 -



)  gt2

θt =





−

where:

vt is the moving average of squared gradients for parameter θ at

iteration t.



is the decay rate (usually set to a value close to 1, e.g., 0.9) that

controls how much the algorithm remembers past gradients.

gt is the gradient of the loss function with respect to parameter θ

at iteration t.



is the learning rate, which determines the step size in the

parameter space.



is a small constant (e.g., 1e-8) added to the denominator for

numerical stability.

RMSPROP is applied to design space exploration in VLSI:

Step 1. Initialization: Start with initial design parameters θ and

set the moving average of squared gradients v0 to zero.

Step 2. Iterative Update: At each iteration t, evaluate the

performance metrics and cost constraints for the current design

point determined by θ.

Step 3. Compute Gradients: Calculate the gradients gt of the

objective functions (performance metrics and cost constraints)

with respect to each design parameter θ.

Step 4. Update Moving Average: Update the moving average of

squared gradients vt using the decay rate



and the current squared

gradient gt2.

Step 5. Compute Step Size: Calculate the step size (



t) for each

parameter θ using the learning rate (



) and the current gradient

(gt).

Step 6. Update Design Parameters: Update the design parameters

θ using the computed step size (



t) to explore the design space

efficiently.

Step 7. Convergence Check: Check for convergence criteria. If

the desired convergence level is reached or the maximum number

of iterations is exceeded, terminate the exploration process.

Step 8. Output: Return the final set of design parameters θ that

represents a Pareto-optimal solution, providing a trade-off

between performance metrics and fabrication costs.

By iteratively updating the design parameters using the

RMSPROP algorithm, the proposed methodology intelligently

navigates the design space to efficiently discover high-quality

VLSI system designs that strike an optimal balance between cost

and performance.

3.2 COST CONSTRAINTS

In VLSI design, there are several performance metrics to

consider, such as clock frequency, power consumption, chip area,

and propagation delay. Additionally, there are strict cost

constraints associated with fabrication, including mask costs,

wafer costs, and testing expenses. The proposed methodology

incorporates these metrics and constraints as objectives and

bounds to guide the exploration process. The goal is to find

designs that achieve the desired performance metrics while

adhering to cost limitations.

Cost constraints in VLSI design refer to the limitations

imposed on the total cost of fabricating the integrated circuit (IC).

These costs encompass various factors, including mask costs,

wafer costs, packaging expenses, and testing expenditures.

Designers need to ensure that the final IC design satisfies these

cost constraints to ensure the economic viability of the product.

M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

1582

In design space exploration using the RMSPROP algorithm,

cost constraints can be represented as inequality equations. Let's

consider a simplified scenario where we have two cost

constraints: C1 and C2.

3.2.1 Mask Cost Constraint (C1):

The mask cost (C1) represents the cost associated with creating

the masks required for the fabrication process. It is typically

proportional to the chip area. Let \(A be the area of the chip, and

(Amax) be the maximum allowable chip area based on the cost

constraint. Then, the mask cost constraint can be expressed as:

A = Amax

3.2.2 Total Cost Constraint (C2):

The total cost (C2) encompasses all fabrication costs,

including mask costs, wafer costs, packaging, and testing. Let T

be the total cost of the IC design, and Tmax be the maximum

acceptable total cost based on the cost constraint. The total cost

constraint can be formulated as:

T = Tmax

The above constraints represent the economic limitations

imposed on the design space exploration process. During the

exploration, the RMSPROP algorithm iteratively updates the

design parameters while ensuring that the constraints A=Amax and

T=Tmax are satisfied at each step. The goal is to find the set of

design parameters that achieve desirable performance metrics

while staying within the prescribed cost bounds.

There may be multiple cost constraints, and the actual

constraints can be more complex, depending on the specific cost

models and factors considered during VLSI design. The design

space exploration process, guided by these cost constraints,

enables designers to identify Pareto-optimal solutions that offer a

balanced trade-off between various performance metrics and

fabrication costs.

3.2.3 Algorithm-Driven Search:

Unlike traditional manual exploration, where designers

iteratively adjust design parameters and assess the outcomes, the

proposed methodology employs the RMSPROP algorithm to

efficiently traverse the design space. The algorithm-driven search

intelligently updates the design parameters based on gradients,

historical information, and learning rates, facilitating faster

convergence to promising design points.

3.3 PARETO OPTIMIZATION

Since cost and performance metrics often conflict with each

other, the proposed methodology adopts a Pareto optimization

approach. This means that the algorithm identifies a set of Pareto-

optimal solutions, where improving one metric would result in a

degradation of another. The designer can then choose from this

set, depending on their specific requirements and priorities.

Pareto optimization, also known as multi-objective

optimization, is a powerful technique used to find solutions that

represent the trade-offs between multiple conflicting objectives.

In the context of VLSI design space exploration, Pareto

optimization helps identify a set of solutions that cannot be

improved in one objective without sacrificing performance in

another objective. These solutions are known as Pareto-optimal

solutions or non-dominated solutions.

Let us consider a simplified scenario with two performance

metrics: P1 and P2, and the corresponding cost metrics: C1 and C2.

The goal is to optimize both performance metrics while adhering

to cost constraints. We assume that higher values of P1 and P2

represent better performance, and lower values of C1 and C2

indicate lower costs.

The objectives can be mathematically represented as follows:

P1(



): The performance metric 1 as a function of parameters



P2(



): The performance metric 2 as a function of parameters



C1(



): The cost metric 1 as a function of parameters



C2(



): The cost metric 2 as a function of parameters



The Pareto front is the set of Pareto-optimal solutions that

represents the trade-offs between P1 and P2 while satisfying the

cost constraints C1 and C2.

Mathematically, a solution



1 is said to dominate another

solution



2 if and only if:

P1(



1) = P1(



2) and P2(



1) = P2(



with at least one of the inequalities being strict (i.e., < instead

of =). In other words,



1 dominates



2 if it performs at least as well



2 in both objectives and strictly better in at least one objective.

The Pareto front is obtained by finding all the non-dominated

solutions, i.e., solutions that are not dominated by any other

solution in the design space. These solutions represent the optimal

trade-offs between P1 and P2 while satisfying the cost constraints

C1 and C2.

Fig.1. Pareto Optimized Solution vs. Non-Pareto Solution

Fig.2. Cost Constraints

ISSN: 2395-1680 (ONLINE) ICTACT JOURNAL ON MICROELECTRONICS, JULY 2023, VOLUME: 09, ISSUE: 02

1583

In the proposed methodology using RMSPROP for design

space exploration, the algorithm will aim to find a set of Pareto-

optimal solutions by intelligently adjusting the design parameters

and evaluating the objectives P1(



) and P2(



) while ensuring that

the cost constraints C1(



) and C2(



) are satisfied. The output of

the exploration process will consist of these Pareto-optimal

solutions, providing designers with a range of options

representing different trade-offs between performance metrics

and fabrication costs.

To evaluate the solutions generated during the exploration

process, the proposed methodology employs fast and accurate

simulation tools. These tools enable rapid assessment of the

performance metrics and cost estimates associated with each

design point, allowing the algorithm to efficiently navigate the

design space. The methodology is designed to seamlessly

integrate into the existing VLSI design flow. It can be used as a

complementary step after the initial design phase or as part of an

iterative refinement process. By integrating with existing design

tools and methodologies, the proposed approach can be readily

adopted by VLSI designers. To validate the effectiveness of the

proposed methodology, comprehensive experiments are

conducted on benchmark VLSI designs. A comparison with

traditional design space exploration techniques is performed to

showcase the advantages of using the RMSPROP algorithm-

assisted approach. The experimental results demonstrate the

superior cost-performance trade-offs achieved through the

proposed methodology.

The proposed methodology presents a novel and efficient

approach to balance cost and performance in VLSI systems using

the RMSPROP algorithm-assisted design space exploration. By

combining principles from deep learning optimization with VLSI

design considerations, this approach opens new avenues for

creating highly efficient and cost-effective integrated circuits to

meet the demands of modern technology.

4. PERFORMANCE EVALUATION

We have a simplified scenario with two performance metrics

(P1 and P2) and two cost metrics (C1 and C2) for SPEC CPU2017

benchmarks evaluated over different hardware platforms

(Standard Cells, Gate-arrays, FPGAs, CPLD).

SPEC CPU2017 benchmarks are prepared with different

configurations for each hardware platform. For each benchmark,

we have performance metrics (P1 and P2) and cost metrics (C1 and

C2) obtained from simulation or analytical models. It runs the

SPEC CPU2017 benchmarks on the final VLSI system designs

obtained from the model for each hardware platform and collect

the performance and cost-related metrics for each benchmark,

such as execution time, power consumption, chip area, and

fabrication cost.

A neural network model is trained using the RMSPROP

algorithm with a dataset containing various VLSI system designs,

their corresponding performance metrics, and cost information.

The trained model is used to explore the design space for each

hardware platform, generating a set of candidate solutions that

offer trade-offs between performance and cost.

Performance Metrics includes P1 represents execution time

(measured in seconds), and P2 represents power consumption

(measured in watts) and the Cost Metrics: C1 represents chip area

(measured in square millimeters), and C2 represents total

fabrication cost (measured in dollars).

Tabel.1. SPEC CPU2017 benchmarks over different hardware

platforms (Standard Cells, Gate-arrays, FPGAs, CPLD)

Hardware Platform

P1 (s)

P2 (W)

C1 (mm2)

C2 ($)

Standard Cells

150

200

Gate-arrays

120

180

FPGAs

100

250

CPLD

300

In the proposed evaluation of the SPEC CPU2017 benchmarks

over different hardware platforms (Standard Cells, Gate-arrays,

FPGAs, CPLD) using the RMSPROP algorithm-assisted design

space exploration, we obtained a set of candidate solutions that

represent trade-offs between performance metrics and fabrication

costs.

From the values, we observe variations in the execution time

(P1) and power consumption (P2) across different hardware

platforms. Standard Cells and Gate-arrays exhibit lower execution

times compared to FPGAs and CPLD. However, FPGAs have the

lowest power consumption, indicating their potential energy

efficiency advantage.

The values show differences in chip area (C1) and fabrication

cost (C2) for each hardware platform. Standard Cells have the

largest chip area and fabrication cost among the options, while

CPLD has the smallest. This reflects the trade-offs between area

efficiency and cost in VLSI design.

The proposed model provides a set of Pareto-optimal

solutions, each representing a unique balance between

performance and cost metrics. For instance, Standard Cells may

offer superior execution times but at the expense of increased chip

area and fabrication costs compared to other platforms.

5. CONCLUSION

This research presents a novel approach for balancing cost and

performance in VLSI systems using the RMSPROP algorithm-

assisted design space exploration. The proposed methodology

leverages the power of the RMSPROP algorithm, originally

designed for deep learning optimization, to efficiently navigate

the complex design space of VLSI systems and identify optimal

trade-offs between performance metrics and fabrication costs.

The evaluation of the proposed model with SPEC CPU2017

benchmarks over various hardware platforms (Standard Cells,

Gate-arrays, FPGAs, CPLD) showcases its effectiveness in

generating a set of Pareto-optimal solutions. These solutions

represent different trade-offs between execution time, power

consumption, chip area, and fabrication costs, allowing VLSI

designers to make informed decisions based on their specific

requirements and constraints. The results demonstrate that the

RMSPROP algorithm-assisted design space exploration

outperforms traditional design methodologies and heuristic-based

approaches in achieving better cost-performance trade-offs. The

ability of the proposed model to efficiently explore the design

space and uncover Pareto-optimal solutions offers significant

advantages in terms of design efficiency and cost-effectiveness.

M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

1584

REFERENCES

[1] H. Zheng and Z. Yu, “Balancing the Cost and Performance

Trade-Offs in SNN Processors”, IEEE Transactions on

Circuits and Systems II: Express Briefs, Vol. 68, No. 9, pp.

3172-3176, 2021.

[2] D. Wu and L. Wang, “SWM: A High-Performance Sparse-

Winograd Matrix Multiplication CNN Accelerator”, IEEE

Transactions on Very Large Scale Integration (VLSI)

Systems, Vol. 29, No. 5, pp. 936-949, 2021.

[3] M.H. Yen, K.H. Lu and C.C. Chan, “A Partial-Givens-

Rotation-Based Symbol Detector for GSM MIMO Systems:

Algorithm and VLSI Implementation”, IEEE Systems

Journal, Vol. 45, No. 2, pp. 1-13, 2023.

[4] M. Sohani and S.C. Jain, “A Predictive Priority-Based

Dynamic Resource Provisioning Scheme with Load

Balancing in Heterogeneous Cloud Computing”, IEEE

Access, Vol. 9, pp. 62653-62664, 2021.

[5] M. Alioto, “Connecting Trends from Society to VLSI

Systems”, IEEE Transactions on Very Large Scale

Integration (VLSI) Systems, Vol. 30, No. 1, pp. 1-4, 2022.

[6] E.N. Mambou, “Efficient Flicker-Free FEC Codes using

Knuth's Balancing Algorithm for VLC”, Proceedings of

IEEE International Conference on Global Communications,

pp. 1-6, 2019.

[7] R. Kuttappa and V. Pano, “Resonant Clock Synchronization

with Active Silicon Interposer for Multi-Die Systems”,

IEEE Transactions on Circuits and Systems I: Regular

Papers, Vol. 68, No. 4, pp. 1636-1645, 2021.

[8] C. Shi, N. Wu and G. Luo, “Low-Cost Real-Time VLSI

System for High-Accuracy Optical Flow Estimation using

Biological Motion Features and Random Forests”, Science

China Information Sciences, Vol. 66, No. 5, pp. 159401-

159412, 2023.

[9] S.K. Patel and S.K. Singhal, “An Area-Delay Efficient

Single-Precision Floating-Point Multiplier for VLSI

Systems”, Microprocessors and Microsystems, Vol. 98, pp.

104798-14804, 2023.

[10] D. Utyamishev and I. Partin-Vaisband, “Multiterminal

Pathfinding in Practical VLSI Systems with Deep Neural

Networks”, ACM Transactions on Design Automation of

Electronic Systems, Vol. 28, No. 4, pp. 1-19, 2023.

[11] R. Sun and P. Liu, “A Flexible and Efficient Real-Time Orb-

Based Full-HD Image Feature Extraction Accelerator”,

IEEE Transactions on Very Large Scale Integration (VLSI)

Systems, Vol. 28, No. 2, pp. 565-575, 2019.

ResearchGate has not been able to resolve any citations for this publication.

Low-cost real-time VLSI system for high-accuracy optical flow estimation using biological motion features and random forests

Article

Full-text available

Feb 2023

This study proposes a low-cost, real-time, very large-scale integration (VLSI) architecture for optical flow estimation. The architecture adopts parallel spatiotemporal filters to extract bio-inspired motion features at each pixel location and uses hardware random forests to infer the motion speed. Our system achieves higher estimation accuracy at low computational hardware costs under real-time constraint than previous biological motion estimation systems. A fieldprogrammable gate array (FPGA) prototype of our VLSI system was implemented on a Xilinx Zynq-7045 FPGA chip. It achieved 30 frame/s motion estimation on 320 × 240 image sequences. The mean endpoint error was only 0.5 pixels for the horizontal translation at 8 pixels/frame, 0.7 pixels for in-plane rotation at 3◦/frame, and 0.8 pixels for fast looming at a rate of 6%/frame, respectively.

A Predictive Priority-Based Dynamic Resource Provisioning Scheme With Load Balancing in Heterogeneous Cloud Computing

Article

Full-text available

Apr 2021

In cloud computing, resource provisioning is a key challenging task due to dynamic resource provisioning for the applications. As per the workload requirements of the application’s resources should be dynamically allocated for the application. Disparities in resource provisioning produce energy, cost wastages, and additionally, it affects Quality of Service (QoS) and increases Service Level Agreement (SLA) violations. So, applications allocated resources quantity should match with the applications required resources quantity. Load balancing in cloud computing can be addressed through optimal scheduling techniques, whereas this solution belongs to the NP-Complete optimization problem category. However, the cloud providers always face resource management issues for variable cloud workloads in the heterogeneous system environment. This issue has been solved by the proposed Predictive Priority-based Modified Heterogeneous Earliest Finish Time (PMHEFT) algorithm, which can estimate the application’s upcoming resource demands. This research contributes towards developing the prediction-based model for efficient and dynamic resource provisioning in a heterogamous system environment to fulfill the end user’s requirements. Existing algorithms fail to meet the user’s Quality of Service (QoS) requirements such as makespan minimization and budget constraints satisfaction, or to incorporate cloud computing principles, i.e., elasticity and heterogeneity of computing resources. In this paper, we proposed a PMHEFT algorithm to minimize the makespan of a given workflow application by improving the load balancing across all the virtual machines. Experimental results show that our proposed algorithm’s makespan, efficiency, and power consumption are better than other algorithms.

Efficient Flicker-Free FEC Codes Using Knuth's Balancing Algorithm for VLC

Conference Paper

Full-text available

Dec 2019

A Partial-Givens-Rotation-Based Symbol Detector for GSM MIMO Systems: Algorithm and VLSI Implementation

Article

Dec 2023

Recently, generalized spatial modulation (GSM) multiple-input multiple-output (MIMO) systems have attracted intensive research interest due to their advantages in balancing spectral efficiency and interchannel interference. Aiming at satisfactory detection performance based on a feasible hardware architecture, in this paper, a partial Givens rotation (PGR)-based symbol detector with a very-large-scale integration (VLSI) hardware architecture is proposed for GSM MIMO systems. The proposed detector contains three main types of modules: PGR blocks, symbol estimation (SE), and minimization. In particular, compared to conventional Givens rotations (GRs), the proposed PGR mechanism can further reduce computational complexity by at least 36 $%$ . In addition, to ensure numerical stability, only adders, shifters, and comparators are used to implement the SE architecture, avoiding the use of dividers. Furthermore, instead of the 2-norm distance measure, the 1-norm distance measure is used in the proposed detector to reduce the number of multipliers, thereby accelerating the detection speed. Finally, computer simulations showthat the proposed algorithm performs achieves near-optimal performance while incurring a lower computational complexity. Additionally, hardware implementation results achieved in TSMC 90-nm CMOS technology at an operating frequency of 704.2 MHz, with a configuration of four transmit antennas, two active transmit antennas, four receive antennas, and 16-ary quadrature amplitude modulation (16-QAM), show that the proposed hardware architecture needs 395.2 k gates and provides a detection throughput of 2347 Mbps and a hardware efficiency of 5.94 Mbps/kGEs for fast fading channels. In comparison to existing works, the proposed detector provides attractive detection performance as well as a feasible hardware architecture.

An area-delay efficient single-precision floating-point multiplier for VLSI systems

Article

Feb 2023
MICROPROCESS MICROSY

Multiterminal Pathfinding in Practical VLSI Systems with Deep Neural Networks

Article

Oct 2022

A multiterminal obstacle-avoiding pathfinding approach is proposed. The approach is inspired by deep image learning. The key idea is based on training a conditional generative adversarial network (cGAN) to interpret a pathfinding task as a graphical bitmap and consequently map a pathfinding task onto a pathfinding solution represented by another bitmap. To enable the proposed cGAN pathfinding, a methodology for generating synthetic dataset is also proposed. The cGAN model is implemented in Python/Keras, trained on synthetically generated data, evaluated on practical VLSI benchmarks, and compared with state-of-the-art. Due to effective parallelization on GPU hardware, the proposed approach yields a state-of-the-art like wirelength and a better runtime and throughput for moderately complex pathfinding tasks. However, the runtime and throughput with the proposed approach remain constant with an increasing task complexity, promising orders of magnitude improvement over state-of-the-art in complex pathfinding tasks. The cGAN pathfinder can be exploited in numerous high throughput applications, such as, navigation, tracking, and routing in complex VLSI systems. The last is of particular interest to this work.

Editorial Opening of the 2022 TVLSI Editorial Year—Connecting Trends From Society to VLSI Systems

Article

Jan 2022

Massimo Alioto

The past year of 2021 has consolidated several societal courses observed in recent years, marking an unprecedented acceleration in a number of trends that have now reached their tipping point. The accelerated digitalization of human activities and outcomes has made the human side of supply chains more distributed on one hand [1] while putting an unprecedented pressure on its logistics side and mandating fundamental rethinking of its resilience–efficiency balance [2] .

Balancing the Cost and Performance Trade-Offs in SNN Processors

Article

Jun 2021

Spiking neural network (SNN), known as the third generation of neural networks, is attracting more and more researchers’ attention because of its high energy efficiency. However, due to the spatiotemporal characteristics, SNN involved a few complicated computations such as exponential function and logarithmic function, making it hard to implement on hardware. In this paper, we propose a SNN processor, which balances the trade-offs between cost and performance. In terms of cost, through software and hardware co-design, a high-robustness SNN model with 2-bit weights and a novel synapse delay management mechanism are adopted to reduce memory utilization. In terms of performance, a spike encoder and a VFA (Vote-For-All) decoder are used to reduce latency and improve inference accuracy respectively. We implement the design on the Xilinx ZCU102 FPGA board and apply it to the MNIST handwritten digit classification, which achieves 90.53% classification accuracy. Compared to a previous proposed SNN processor of a similar neuron scale, our design achieves a 156× inference speed-up and consumes 0.32× hardware resources.

SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator

Article

Mar 2021

Many convolutional neural network (CNN) accelerators are proposed to exploit the sparsity of the networks recently to enjoy the benefits of both computation and memory reduction. However, most accelerators cannot exploit the sparsity of both activations and weights. For those works that exploit both sparsity opportunities, they cannot achieve the stable load balance through a static scheduling (SS) strategy, which is vulnerable to the sparsity distribution. In this work, a balanced compressed sparse row format and a dynamic scheduling strategy are proposed to improve the load balance. A set-associate structure is also presented to tradeoff the load balance and hardware resource overhead. We propose SWM to accelerate the CNN inference, which supports both sparse convolution and sparse fully connected (FC) layers. SWM provides Winograd adaptability for large convolution kernels and supports both 16-bit and 8-bit quantized CNNs. Due to the activation sharing, 8-bit processing can achieve theoretically twice the performance of the 16-bit processing with the same sparsity. The architecture is evaluated with VGG16 and ResNet50, which achieves: at most 7.6 TOP/s for sparse-Winograd convolution and three TOP/s for sparse matrix multiplication with 16-bit quantization on Xilinx VCU1525 platform. SWM can process 310/725 images per second for VGG16/ResNet50 with 16-bit quantization. Compared with the state-of-the-art works, our design can achieve at least $1.53 \boldsymbol {\times }$ speedup and $1.8 \boldsymbol {\times }$ energy efficiency improvement.

Resonant Clock Synchronization With Active Silicon Interposer for Multi-Die Systems

Article

Feb 2021

This paper presents the integration of resonant clocking to multi-die architectures to synchronize individual chiplets connected through an active silicon interposer. The proposed inter-chiplet synchronization through the active silicon interposer rotary oscillator array (ASI-ROA) provides a unitary clock domain to the multiple die (i.e. multiple chiplets) in the package with a very low design overhead. System performance analysis is performed with parasitics-extracted, post-layout simulation models of two different sizes of representative heterogeneous multi-die architectures, each with varying number of RISC-V cores per die. Each RISC-V core of the multi-die package belongs to the unitary clock domain, designed with ASI-ROA to operate at a frequency of 2 GHz. The proposed architecture is investigated for robustness in frequency and skew across the multi-die system (MDS) with SPICE based simulations of post layout models, demonstrating variations of only 80 MHz for a 2 GHz target frequency. The power savings are upto 41% for the overall MDS, compared to an equivalent implementation with a contemporary ADPLL used to synchronize the multiple chiplets over the active interposer. The average clock skew of the completely resonant architecture presented in this work is 8.2 ps.

BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

Abstract

Recommended publications

VLSI DESIGN

LOW-POWER, HIGH-SPEED VLSI SIGNAL PROCESSING FOR AI APPLICATIONS

Conventional Static Cmos Based Logic Circuits Design Through Adaptive Feedback Equalization

A high-speed two's complement bit-sequential multiplier