ArticlePDF Available

BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION

Authors:

Abstract

As Very Large Scale Integration (VLSI) technology advances, the need to efficiently balance cost and performance in VLSI systems becomes paramount. To address this challenge, we propose a novel approach that leverages the RMSPROP algorithm for assisted design space exploration. The RMSPROP algorithm, which has proven effective in the field of deep learning optimization, is adapted to navigate the complex design space of VLSI systems. By integrating RMSPROP into the design space exploration process, we can intelligently search for optimal trade-offs between cost and performance, leading to highly efficient VLSI designs. Our experimental results demonstrate the effectiveness of the RMSPROP algorithm-assisted design space exploration, showcasing significant improvements in cost-performance trade-offs compared to traditional design methodologies. This research opens new avenues for designing VLSI systems with improved efficiency, enabling the realization of high-performance yet cost-effective integrated circuits.
M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION
DOI: 10.21917/ijme.2023.0274
1580
BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP
ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION
M. Pradeep1, Udutha Rajender2, Pravin Prakash Adivarekar3 and Sumit Kumar Gupta4
1Department of Electronics and Communication Engineering, Shri Vishnu Engineering College for Women, India
2Department of Electronics and Communication Engineering, Vaageswari College of Engineering, India
3Department of Computer Engineering, A.P.Shah Institute of Technology, India
4Department of Physics, St. Wilfred’s PG College, India
Abstract
As Very Large Scale Integration (VLSI) technology advances, the need
to efficiently balance cost and performance in VLSI systems becomes
paramount. To address this challenge, we propose a novel approach
that leverages the RMSPROP algorithm for assisted design space
exploration. The RMSPROP algorithm, which has proven effective in
the field of deep learning optimization, is adapted to navigate the
complex design space of VLSI systems. By integrating RMSPROP into
the design space exploration process, we can intelligently search for
optimal trade-offs between cost and performance, leading to highly
efficient VLSI designs. Our experimental results demonstrate the
effectiveness of the RMSPROP algorithm-assisted design space
exploration, showcasing significant improvements in cost-performance
trade-offs compared to traditional design methodologies. This research
opens new avenues for designing VLSI systems with improved
efficiency, enabling the realization of high-performance yet cost-
effective integrated circuits.
Keywords:
VLSI Systems, RMSPROP Algorithm, Design Space Exploration, Cost-
Performance Trade-Offs
1. INTRODUCTION
In recent years, the demand for high-performance and cost-
effective Very Large Scale Integration (VLSI) systems has
surged, driven by the rapid growth of modern technologies such
as artificial intelligence, Internet of Things (IoT), and cloud
computing. VLSI systems, comprising intricate integrated circuits
(ICs), form the backbone of numerous electronic devices and play
a pivotal role in shaping technological advancements [1].
However, designing VLSI systems that strike an optimal balance
between performance and cost remains a significant challenge,
given the increasing complexity and scale of these systems [2].
Traditional VLSI design methodologies often rely on manual
exploration of the design space to identify suitable trade-offs
between performance metrics, such as speed and power
consumption, and the cost of fabrication. As the design space
becomes increasingly vast and intricate, manual exploration
becomes impractical and time-consuming, hindering the
discovery of optimal solutions [3]. Consequently, there is a
pressing need for innovative approaches that can efficiently
navigate the design space, leading to the development of VLSI
systems that maximize performance while minimizing costs [4].
We propose a pioneering approach that harnesses the power
of the RMSPROP algorithm to facilitate design space exploration
for VLSI systems. The RMSPROP algorithm, initially designed
for optimizing deep learning models, demonstrates remarkable
efficiency in finding convergence paths while efficiently handling
the variance of gradients. By adapting RMSPROP to the VLSI
domain, we aim to address the challenge of balancing cost and
performance in VLSI systems.
The primary objective of this research is to present a
systematic and effective framework that combines the RMSPROP
algorithm with design space exploration techniques to efficiently
explore the vast solution space of VLSI systems. Through this
integration, we seek to uncover a range of design options that
offer favorable trade-offs between performance metrics and
fabrication costs, thus enabling the development of high-quality
VLSI systems with enhanced efficiency and cost-effectiveness.
2. RELATED WORKS
Various research efforts have been dedicated to exploring
design space exploration techniques for VLSI systems. These
studies often focus on optimization algorithms, evolutionary
strategies, and multi-objective optimization methods to efficiently
search for optimal design points in the vast solution space. While
these approaches have shown promising results, there is still a
need for novel methodologies that can handle the increasing
complexity and size of modern VLSI systems [5].
In the field of deep learning, optimization algorithms like
RMSPROP, Adam, and stochastic gradient descent (SGD) have
been extensively studied to train complex neural networks
efficiently. These algorithms address challenges such as
convergence speed, handling large-scale datasets, and alleviating
the problem of vanishing or exploding gradients. Drawing
inspiration from the success of these algorithms, researchers have
started exploring their adaptability to other domains, including
VLSI design [6].
The trade-offs between cost and performance are critical
considerations in VLSI system design. Researchers have
examined various techniques to optimize power consumption,
chip area, clock frequency, and other performance metrics, while
still adhering to stringent cost constraints. These studies often
utilize analytical models, heuristics, or machine learning
approaches to find the best compromise between performance and
cost [7].
Machine learning techniques have been increasingly
employed in VLSI design to automate various tasks, such as
layout generation, optimization, and synthesis. Reinforcement
learning, genetic algorithms, and neural architecture search are
some of the machine learning-based methods used for exploring
the design space and improving the efficiency of VLSI [8].
Several research efforts have focused on developing
methodologies that streamline the VLSI design process, reduce
design time, and enhance overall productivity. These
methodologies often incorporate intelligent algorithms and
ISSN: 2395-1680 (ONLINE) ICTACT JOURNAL ON MICROELECTRONICS, JULY 2023, VOLUME: 09, ISSUE: 02
1581
automation techniques to achieve better performance with fewer
resources [9].
The hardware-software co-design aims to optimize the
interaction between hardware and software components in a
system. Researchers have explored co-design techniques that
allow better utilization of hardware resources while achieving
high-performance results, ultimately contributing to cost-efficient
VLSI systems [10].
Studies examining the most recent advancements in VLSI
systems and integrated circuits are essential for understanding the
current state of the art. These works often serve as benchmarks
for evaluating the efficacy of new design methodologies,
including the proposed RMSPROP algorithm-assisted design
space exploration approach [11].
By studying these related works, we can gain insights into the
existing challenges and solutions in VLSI design, paving the way
for our novel approach to balance cost and performance using the
RMSPROP algorithm for design space exploration.
3. PROPOSED COST-EFFECTIVE DESIGN
The proposed methodology aims to address the challenge of
balancing cost and performance in VLSI systems by leveraging
the RMSPROP algorithm for assisted design space exploration.
This innovative approach combines principles from the fields of
deep learning optimization and VLSI design to efficiently
navigate the complex design space and identify optimal trade-offs
between various performance metrics and fabrication costs. The
key components of the proposed methodology are as follows:
3.1 DESIGN SPACE EXPLORATION USING
RMSPROP
The heart of the methodology lies in the adaptation of the
RMSPROP algorithm, originally designed for deep learning
optimization, to the context of VLSI design. The RMSPROP
algorithm demonstrates remarkable efficiency in dealing with the
variance of gradients and efficiently converging to optimal
solutions. By utilizing RMSPROP for design space exploration,
we can effectively search the vast solution space of VLSI systems,
efficiently evaluating and adjusting design parameters.
Design space exploration using RMSPROP involves adapting
the RMSPROP algorithm, originally designed for deep learning
optimization, to navigate the design space of VLSI systems
efficiently. The goal is to iteratively update the design parameters
to find optimal trade-offs between various performance metrics
and fabrication costs. The core idea behind RMSPROP is to adjust
the learning rate for each parameter based on the historical
gradient information, which allows for faster convergence and
handling of gradient variance. The update of RMSPROP for a
parameter θ at iteration t is given as follows:
vt =
vt-1 + (1 -
) gt2
θt =
t
t
g
v
−
+
where:
vt is the moving average of squared gradients for parameter θ at
iteration t.
is the decay rate (usually set to a value close to 1, e.g., 0.9) that
controls how much the algorithm remembers past gradients.
gt is the gradient of the loss function with respect to parameter θ
at iteration t.
is the learning rate, which determines the step size in the
parameter space.
is a small constant (e.g., 1e-8) added to the denominator for
numerical stability.
RMSPROP is applied to design space exploration in VLSI:
Step 1. Initialization: Start with initial design parameters θ and
set the moving average of squared gradients v0 to zero.
Step 2. Iterative Update: At each iteration t, evaluate the
performance metrics and cost constraints for the current design
point determined by θ.
Step 3. Compute Gradients: Calculate the gradients gt of the
objective functions (performance metrics and cost constraints)
with respect to each design parameter θ.
Step 4. Update Moving Average: Update the moving average of
squared gradients vt using the decay rate
and the current squared
gradient gt2.
Step 5. Compute Step Size: Calculate the step size (
t) for each
parameter θ using the learning rate (
) and the current gradient
(gt).
Step 6. Update Design Parameters: Update the design parameters
θ using the computed step size (
t) to explore the design space
efficiently.
Step 7. Convergence Check: Check for convergence criteria. If
the desired convergence level is reached or the maximum number
of iterations is exceeded, terminate the exploration process.
Step 8. Output: Return the final set of design parameters θ that
represents a Pareto-optimal solution, providing a trade-off
between performance metrics and fabrication costs.
By iteratively updating the design parameters using the
RMSPROP algorithm, the proposed methodology intelligently
navigates the design space to efficiently discover high-quality
VLSI system designs that strike an optimal balance between cost
and performance.
3.2 COST CONSTRAINTS
In VLSI design, there are several performance metrics to
consider, such as clock frequency, power consumption, chip area,
and propagation delay. Additionally, there are strict cost
constraints associated with fabrication, including mask costs,
wafer costs, and testing expenses. The proposed methodology
incorporates these metrics and constraints as objectives and
bounds to guide the exploration process. The goal is to find
designs that achieve the desired performance metrics while
adhering to cost limitations.
Cost constraints in VLSI design refer to the limitations
imposed on the total cost of fabricating the integrated circuit (IC).
These costs encompass various factors, including mask costs,
wafer costs, packaging expenses, and testing expenditures.
Designers need to ensure that the final IC design satisfies these
cost constraints to ensure the economic viability of the product.
M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION
1582
In design space exploration using the RMSPROP algorithm,
cost constraints can be represented as inequality equations. Let's
consider a simplified scenario where we have two cost
constraints: C1 and C2.
3.2.1 Mask Cost Constraint (C1):
The mask cost (C1) represents the cost associated with creating
the masks required for the fabrication process. It is typically
proportional to the chip area. Let \(A be the area of the chip, and
(Amax) be the maximum allowable chip area based on the cost
constraint. Then, the mask cost constraint can be expressed as:
A = Amax
3.2.2 Total Cost Constraint (C2):
The total cost (C2) encompasses all fabrication costs,
including mask costs, wafer costs, packaging, and testing. Let T
be the total cost of the IC design, and Tmax be the maximum
acceptable total cost based on the cost constraint. The total cost
constraint can be formulated as:
T = Tmax
The above constraints represent the economic limitations
imposed on the design space exploration process. During the
exploration, the RMSPROP algorithm iteratively updates the
design parameters while ensuring that the constraints A=Amax and
T=Tmax are satisfied at each step. The goal is to find the set of
design parameters that achieve desirable performance metrics
while staying within the prescribed cost bounds.
There may be multiple cost constraints, and the actual
constraints can be more complex, depending on the specific cost
models and factors considered during VLSI design. The design
space exploration process, guided by these cost constraints,
enables designers to identify Pareto-optimal solutions that offer a
balanced trade-off between various performance metrics and
fabrication costs.
3.2.3 Algorithm-Driven Search:
Unlike traditional manual exploration, where designers
iteratively adjust design parameters and assess the outcomes, the
proposed methodology employs the RMSPROP algorithm to
efficiently traverse the design space. The algorithm-driven search
intelligently updates the design parameters based on gradients,
historical information, and learning rates, facilitating faster
convergence to promising design points.
3.3 PARETO OPTIMIZATION
Since cost and performance metrics often conflict with each
other, the proposed methodology adopts a Pareto optimization
approach. This means that the algorithm identifies a set of Pareto-
optimal solutions, where improving one metric would result in a
degradation of another. The designer can then choose from this
set, depending on their specific requirements and priorities.
Pareto optimization, also known as multi-objective
optimization, is a powerful technique used to find solutions that
represent the trade-offs between multiple conflicting objectives.
In the context of VLSI design space exploration, Pareto
optimization helps identify a set of solutions that cannot be
improved in one objective without sacrificing performance in
another objective. These solutions are known as Pareto-optimal
solutions or non-dominated solutions.
Let us consider a simplified scenario with two performance
metrics: P1 and P2, and the corresponding cost metrics: C1 and C2.
The goal is to optimize both performance metrics while adhering
to cost constraints. We assume that higher values of P1 and P2
represent better performance, and lower values of C1 and C2
indicate lower costs.
The objectives can be mathematically represented as follows:
P1(
): The performance metric 1 as a function of parameters
.
P2(
): The performance metric 2 as a function of parameters
.
C1(
): The cost metric 1 as a function of parameters
.
C2(
): The cost metric 2 as a function of parameters
.
The Pareto front is the set of Pareto-optimal solutions that
represents the trade-offs between P1 and P2 while satisfying the
cost constraints C1 and C2.
Mathematically, a solution
1 is said to dominate another
solution
2 if and only if:
P1(
1) = P1(
2) and P2(
1) = P2(
2)
with at least one of the inequalities being strict (i.e., < instead
of =). In other words,
1 dominates
2 if it performs at least as well
as
2 in both objectives and strictly better in at least one objective.
The Pareto front is obtained by finding all the non-dominated
solutions, i.e., solutions that are not dominated by any other
solution in the design space. These solutions represent the optimal
trade-offs between P1 and P2 while satisfying the cost constraints
C1 and C2.
Fig.1. Pareto Optimized Solution vs. Non-Pareto Solution
Fig.2. Cost Constraints
ISSN: 2395-1680 (ONLINE) ICTACT JOURNAL ON MICROELECTRONICS, JULY 2023, VOLUME: 09, ISSUE: 02
1583
In the proposed methodology using RMSPROP for design
space exploration, the algorithm will aim to find a set of Pareto-
optimal solutions by intelligently adjusting the design parameters
and evaluating the objectives P1(
) and P2(
) while ensuring that
the cost constraints C1(
) and C2(
) are satisfied. The output of
the exploration process will consist of these Pareto-optimal
solutions, providing designers with a range of options
representing different trade-offs between performance metrics
and fabrication costs.
To evaluate the solutions generated during the exploration
process, the proposed methodology employs fast and accurate
simulation tools. These tools enable rapid assessment of the
performance metrics and cost estimates associated with each
design point, allowing the algorithm to efficiently navigate the
design space. The methodology is designed to seamlessly
integrate into the existing VLSI design flow. It can be used as a
complementary step after the initial design phase or as part of an
iterative refinement process. By integrating with existing design
tools and methodologies, the proposed approach can be readily
adopted by VLSI designers. To validate the effectiveness of the
proposed methodology, comprehensive experiments are
conducted on benchmark VLSI designs. A comparison with
traditional design space exploration techniques is performed to
showcase the advantages of using the RMSPROP algorithm-
assisted approach. The experimental results demonstrate the
superior cost-performance trade-offs achieved through the
proposed methodology.
The proposed methodology presents a novel and efficient
approach to balance cost and performance in VLSI systems using
the RMSPROP algorithm-assisted design space exploration. By
combining principles from deep learning optimization with VLSI
design considerations, this approach opens new avenues for
creating highly efficient and cost-effective integrated circuits to
meet the demands of modern technology.
4. PERFORMANCE EVALUATION
We have a simplified scenario with two performance metrics
(P1 and P2) and two cost metrics (C1 and C2) for SPEC CPU2017
benchmarks evaluated over different hardware platforms
(Standard Cells, Gate-arrays, FPGAs, CPLD).
SPEC CPU2017 benchmarks are prepared with different
configurations for each hardware platform. For each benchmark,
we have performance metrics (P1 and P2) and cost metrics (C1 and
C2) obtained from simulation or analytical models. It runs the
SPEC CPU2017 benchmarks on the final VLSI system designs
obtained from the model for each hardware platform and collect
the performance and cost-related metrics for each benchmark,
such as execution time, power consumption, chip area, and
fabrication cost.
A neural network model is trained using the RMSPROP
algorithm with a dataset containing various VLSI system designs,
their corresponding performance metrics, and cost information.
The trained model is used to explore the design space for each
hardware platform, generating a set of candidate solutions that
offer trade-offs between performance and cost.
Performance Metrics includes P1 represents execution time
(measured in seconds), and P2 represents power consumption
(measured in watts) and the Cost Metrics: C1 represents chip area
(measured in square millimeters), and C2 represents total
fabrication cost (measured in dollars).
Tabel.1. SPEC CPU2017 benchmarks over different hardware
platforms (Standard Cells, Gate-arrays, FPGAs, CPLD)
Hardware Platform
P1 (s)
P2 (W)
C1 (mm2)
C2 ($)
Standard Cells
50
10
150
200
Gate-arrays
45
15
120
180
FPGAs
60
8
100
250
CPLD
70
5
80
300
In the proposed evaluation of the SPEC CPU2017 benchmarks
over different hardware platforms (Standard Cells, Gate-arrays,
FPGAs, CPLD) using the RMSPROP algorithm-assisted design
space exploration, we obtained a set of candidate solutions that
represent trade-offs between performance metrics and fabrication
costs.
From the values, we observe variations in the execution time
(P1) and power consumption (P2) across different hardware
platforms. Standard Cells and Gate-arrays exhibit lower execution
times compared to FPGAs and CPLD. However, FPGAs have the
lowest power consumption, indicating their potential energy
efficiency advantage.
The values show differences in chip area (C1) and fabrication
cost (C2) for each hardware platform. Standard Cells have the
largest chip area and fabrication cost among the options, while
CPLD has the smallest. This reflects the trade-offs between area
efficiency and cost in VLSI design.
The proposed model provides a set of Pareto-optimal
solutions, each representing a unique balance between
performance and cost metrics. For instance, Standard Cells may
offer superior execution times but at the expense of increased chip
area and fabrication costs compared to other platforms.
5. CONCLUSION
This research presents a novel approach for balancing cost and
performance in VLSI systems using the RMSPROP algorithm-
assisted design space exploration. The proposed methodology
leverages the power of the RMSPROP algorithm, originally
designed for deep learning optimization, to efficiently navigate
the complex design space of VLSI systems and identify optimal
trade-offs between performance metrics and fabrication costs.
The evaluation of the proposed model with SPEC CPU2017
benchmarks over various hardware platforms (Standard Cells,
Gate-arrays, FPGAs, CPLD) showcases its effectiveness in
generating a set of Pareto-optimal solutions. These solutions
represent different trade-offs between execution time, power
consumption, chip area, and fabrication costs, allowing VLSI
designers to make informed decisions based on their specific
requirements and constraints. The results demonstrate that the
RMSPROP algorithm-assisted design space exploration
outperforms traditional design methodologies and heuristic-based
approaches in achieving better cost-performance trade-offs. The
ability of the proposed model to efficiently explore the design
space and uncover Pareto-optimal solutions offers significant
advantages in terms of design efficiency and cost-effectiveness.
M PRADEEP et al.: BALANCING COST AND PERFORMANCE IN VLSI SYSTEMS USING RMSPROP ALGORITHM-ASSISTED DESIGN SPACE EXPLORATION
1584
REFERENCES
[1] H. Zheng and Z. Yu, Balancing the Cost and Performance
Trade-Offs in SNN Processors”, IEEE Transactions on
Circuits and Systems II: Express Briefs, Vol. 68, No. 9, pp.
3172-3176, 2021.
[2] D. Wu and L. Wang, SWM: A High-Performance Sparse-
Winograd Matrix Multiplication CNN Accelerator”, IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, Vol. 29, No. 5, pp. 936-949, 2021.
[3] M.H. Yen, K.H. Lu and C.C. Chan, A Partial-Givens-
Rotation-Based Symbol Detector for GSM MIMO Systems:
Algorithm and VLSI Implementation”, IEEE Systems
Journal, Vol. 45, No. 2, pp. 1-13, 2023.
[4] M. Sohani and S.C. Jain, “A Predictive Priority-Based
Dynamic Resource Provisioning Scheme with Load
Balancing in Heterogeneous Cloud Computing”, IEEE
Access, Vol. 9, pp. 62653-62664, 2021.
[5] M. Alioto, Connecting Trends from Society to VLSI
Systems”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, Vol. 30, No. 1, pp. 1-4, 2022.
[6] E.N. Mambou, Efficient Flicker-Free FEC Codes using
Knuth's Balancing Algorithm for VLC”, Proceedings of
IEEE International Conference on Global Communications,
pp. 1-6, 2019.
[7] R. Kuttappa and V. Pano, Resonant Clock Synchronization
with Active Silicon Interposer for Multi-Die Systems”,
IEEE Transactions on Circuits and Systems I: Regular
Papers, Vol. 68, No. 4, pp. 1636-1645, 2021.
[8] C. Shi, N. Wu and G. Luo, Low-Cost Real-Time VLSI
System for High-Accuracy Optical Flow Estimation using
Biological Motion Features and Random Forests”, Science
China Information Sciences, Vol. 66, No. 5, pp. 159401-
159412, 2023.
[9] S.K. Patel and S.K. Singhal, An Area-Delay Efficient
Single-Precision Floating-Point Multiplier for VLSI
Systems”, Microprocessors and Microsystems, Vol. 98, pp.
104798-14804, 2023.
[10] D. Utyamishev and I. Partin-Vaisband, Multiterminal
Pathfinding in Practical VLSI Systems with Deep Neural
Networks”, ACM Transactions on Design Automation of
Electronic Systems, Vol. 28, No. 4, pp. 1-19, 2023.
[11] R. Sun and P. Liu, A Flexible and Efficient Real-Time Orb-
Based Full-HD Image Feature Extraction Accelerator”,
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Vol. 28, No. 2, pp. 565-575, 2019.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This study proposes a low-cost, real-time, very large-scale integration (VLSI) architecture for optical flow estimation. The architecture adopts parallel spatiotemporal filters to extract bio-inspired motion features at each pixel location and uses hardware random forests to infer the motion speed. Our system achieves higher estimation accuracy at low computational hardware costs under real-time constraint than previous biological motion estimation systems. A fieldprogrammable gate array (FPGA) prototype of our VLSI system was implemented on a Xilinx Zynq-7045 FPGA chip. It achieved 30 frame/s motion estimation on 320 × 240 image sequences. The mean endpoint error was only 0.5 pixels for the horizontal translation at 8 pixels/frame, 0.7 pixels for in-plane rotation at 3◦/frame, and 0.8 pixels for fast looming at a rate of 6%/frame, respectively.
Article
Full-text available
In cloud computing, resource provisioning is a key challenging task due to dynamic resource provisioning for the applications. As per the workload requirements of the application’s resources should be dynamically allocated for the application. Disparities in resource provisioning produce energy, cost wastages, and additionally, it affects Quality of Service (QoS) and increases Service Level Agreement (SLA) violations. So, applications allocated resources quantity should match with the applications required resources quantity. Load balancing in cloud computing can be addressed through optimal scheduling techniques, whereas this solution belongs to the NP-Complete optimization problem category. However, the cloud providers always face resource management issues for variable cloud workloads in the heterogeneous system environment. This issue has been solved by the proposed Predictive Priority-based Modified Heterogeneous Earliest Finish Time (PMHEFT) algorithm, which can estimate the application’s upcoming resource demands. This research contributes towards developing the prediction-based model for efficient and dynamic resource provisioning in a heterogamous system environment to fulfill the end user’s requirements. Existing algorithms fail to meet the user’s Quality of Service (QoS) requirements such as makespan minimization and budget constraints satisfaction, or to incorporate cloud computing principles, i.e., elasticity and heterogeneity of computing resources. In this paper, we proposed a PMHEFT algorithm to minimize the makespan of a given workflow application by improving the load balancing across all the virtual machines. Experimental results show that our proposed algorithm’s makespan, efficiency, and power consumption are better than other algorithms.
Article
Recently, generalized spatial modulation (GSM) multiple-input multiple-output (MIMO) systems have attracted intensive research interest due to their advantages in balancing spectral efficiency and interchannel interference. Aiming at satisfactory detection performance based on a feasible hardware architecture, in this paper, a partial Givens rotation (PGR)-based symbol detector with a very-large-scale integration (VLSI) hardware architecture is proposed for GSM MIMO systems. The proposed detector contains three main types of modules: PGR blocks, symbol estimation (SE), and minimization. In particular, compared to conventional Givens rotations (GRs), the proposed PGR mechanism can further reduce computational complexity by at least 36 $%$ . In addition, to ensure numerical stability, only adders, shifters, and comparators are used to implement the SE architecture, avoiding the use of dividers. Furthermore, instead of the 2-norm distance measure, the 1-norm distance measure is used in the proposed detector to reduce the number of multipliers, thereby accelerating the detection speed. Finally, computer simulations showthat the proposed algorithm performs achieves near-optimal performance while incurring a lower computational complexity. Additionally, hardware implementation results achieved in TSMC 90-nm CMOS technology at an operating frequency of 704.2 MHz, with a configuration of four transmit antennas, two active transmit antennas, four receive antennas, and 16-ary quadrature amplitude modulation (16-QAM), show that the proposed hardware architecture needs 395.2 k gates and provides a detection throughput of 2347 Mbps and a hardware efficiency of 5.94 Mbps/kGEs for fast fading channels. In comparison to existing works, the proposed detector provides attractive detection performance as well as a feasible hardware architecture.
Article
A multiterminal obstacle-avoiding pathfinding approach is proposed. The approach is inspired by deep image learning. The key idea is based on training a conditional generative adversarial network (cGAN) to interpret a pathfinding task as a graphical bitmap and consequently map a pathfinding task onto a pathfinding solution represented by another bitmap. To enable the proposed cGAN pathfinding, a methodology for generating synthetic dataset is also proposed. The cGAN model is implemented in Python/Keras, trained on synthetically generated data, evaluated on practical VLSI benchmarks, and compared with state-of-the-art. Due to effective parallelization on GPU hardware, the proposed approach yields a state-of-the-art like wirelength and a better runtime and throughput for moderately complex pathfinding tasks. However, the runtime and throughput with the proposed approach remain constant with an increasing task complexity, promising orders of magnitude improvement over state-of-the-art in complex pathfinding tasks. The cGAN pathfinder can be exploited in numerous high throughput applications, such as, navigation, tracking, and routing in complex VLSI systems. The last is of particular interest to this work.
Article
The past year of 2021 has consolidated several societal courses observed in recent years, marking an unprecedented acceleration in a number of trends that have now reached their tipping point. The accelerated digitalization of human activities and outcomes has made the human side of supply chains more distributed on one hand [1] while putting an unprecedented pressure on its logistics side and mandating fundamental rethinking of its resilience–efficiency balance [2] .
Article
Spiking neural network (SNN), known as the third generation of neural networks, is attracting more and more researchers’ attention because of its high energy efficiency. However, due to the spatiotemporal characteristics, SNN involved a few complicated computations such as exponential function and logarithmic function, making it hard to implement on hardware. In this paper, we propose a SNN processor, which balances the trade-offs between cost and performance. In terms of cost, through software and hardware co-design, a high-robustness SNN model with 2-bit weights and a novel synapse delay management mechanism are adopted to reduce memory utilization. In terms of performance, a spike encoder and a VFA (Vote-For-All) decoder are used to reduce latency and improve inference accuracy respectively. We implement the design on the Xilinx ZCU102 FPGA board and apply it to the MNIST handwritten digit classification, which achieves 90.53% classification accuracy. Compared to a previous proposed SNN processor of a similar neuron scale, our design achieves a 156× inference speed-up and consumes 0.32× hardware resources.
Article
Many convolutional neural network (CNN) accelerators are proposed to exploit the sparsity of the networks recently to enjoy the benefits of both computation and memory reduction. However, most accelerators cannot exploit the sparsity of both activations and weights. For those works that exploit both sparsity opportunities, they cannot achieve the stable load balance through a static scheduling (SS) strategy, which is vulnerable to the sparsity distribution. In this work, a balanced compressed sparse row format and a dynamic scheduling strategy are proposed to improve the load balance. A set-associate structure is also presented to tradeoff the load balance and hardware resource overhead. We propose SWM to accelerate the CNN inference, which supports both sparse convolution and sparse fully connected (FC) layers. SWM provides Winograd adaptability for large convolution kernels and supports both 16-bit and 8-bit quantized CNNs. Due to the activation sharing, 8-bit processing can achieve theoretically twice the performance of the 16-bit processing with the same sparsity. The architecture is evaluated with VGG16 and ResNet50, which achieves: at most 7.6 TOP/s for sparse-Winograd convolution and three TOP/s for sparse matrix multiplication with 16-bit quantization on Xilinx VCU1525 platform. SWM can process 310/725 images per second for VGG16/ResNet50 with 16-bit quantization. Compared with the state-of-the-art works, our design can achieve at least $1.53 \boldsymbol {\times }$ speedup and $1.8 \boldsymbol {\times }$ energy efficiency improvement.
Article
This paper presents the integration of resonant clocking to multi-die architectures to synchronize individual chiplets connected through an active silicon interposer. The proposed inter-chiplet synchronization through the active silicon interposer rotary oscillator array (ASI-ROA) provides a unitary clock domain to the multiple die (i.e. multiple chiplets) in the package with a very low design overhead. System performance analysis is performed with parasitics-extracted, post-layout simulation models of two different sizes of representative heterogeneous multi-die architectures, each with varying number of RISC-V cores per die. Each RISC-V core of the multi-die package belongs to the unitary clock domain, designed with ASI-ROA to operate at a frequency of 2 GHz. The proposed architecture is investigated for robustness in frequency and skew across the multi-die system (MDS) with SPICE based simulations of post layout models, demonstrating variations of only 80 MHz for a 2 GHz target frequency. The power savings are upto 41% for the overall MDS, compared to an equivalent implementation with a contemporary ADPLL used to synchronize the multiple chiplets over the active interposer. The average clock skew of the completely resonant architecture presented in this work is 8.2 ps.