HMC system architecture for PIM processing.

Source publication

FIGURE 4. (a) A mapping example for DNN graph divided into five task...

FIGURE 5. Exemplary allocation for DNN tasks with data dependency...

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Article

Full-text available

Jan 2021

Processing-in-memory (PIM) comprises computational logic in the memory domain. It is the most promising solution to alleviate the memory bandwidth problem in deep neural network (DNN) processing. The hybrid memory cube (HMC), a 3D stacked memory structure, can efficiently implement the PIM architecture by maximizing the existing legacy hardware. To...

Context 1

... et al. proposed a Tesseract architecture to maximize the available memory bandwidth in multiple HMC-based PIM structures for graph processing [5]. Tesseract is still widely used as the base architecture for HMC-based PIM, and the system architecture is presented in Fig. 1. In Tesseract, the network between the PEs of the HMC and the I/O links is configured in a fully connected topology. Each PE has an integrated in-order core for execution and a PF buffer (PFB) to prefetch communication traffic. Links between HMCs are connected in a dragonfly topology. The proposed method focuses on the HMC ...

View in full-text

Context 2

... intermediate result transfer schedule between datadependent tasks is affected by the end time of the upstream task and the start time of the downstream task. Intermediate results can be stored in PFB or large-capacity DRAM presented in Fig. 1, varying access latency accordingly. This latency variance is intensified by the packet-based communication structure in the multi-HMC based PIM system. The concept of retiming is applied to solve the problem of context switching and latency hiding accordingly. First, the terms VOLUME 9, 2021 iteration, prologue, and retiming to ...

View in full-text

FIGURE 2: Illustration of the linear and nonlinear inputs of NVAR.

A Lightweight Deep Neural Network Using a Mixer-Type Nonlinear Vector Autoregression

Article

Full-text available

Jan 2023

The design of a lightweight deep learning model would be an ideal solution for overcoming resource limitations when implementing artificial intelligence in edge sites. In this study, we propose a lightweight deep neural network that uses a Mixer-type architecture based on nonlinear vector autoregression (NVAR), which we refer to as Mixer-type NVAR....

NASLMRP: Design of a Negotiation Aware Service Level Agreement Model for Resource Provisioning in Cloud Environments

Article

Oct 2023

Cloud resource provisioning requires examining tasks, dependencies, deadlines, and capacity distribution. Scalability is hindered by incomplete or complex models. Comprehensive models with low-to-moderate QoS are unsuitable for real-time scenarios. This research proposes a Negotiation Aware SLA Model for Resource Provisioning in cloud deployments to address these challenges. In the proposed model, a task-level SLA maximizes resource allocation fairness and incorporates task dependency for correlated task types. This process's new tasks are processed by an efficient hierarchical task clustering process. Priority is assigned to each task. For efficient provisioning, an Elephant Herding Optimization (EHO) model allocates resources to these clusters based on task deadline and make-span levels. The EHO Model suggests a fitness function that shortens the make-span and raises deadline awareness. Q-Learning is used in the VM-aware negotiation framework for capacity tuning and task-shifting to post-process allocated tasks for faster task execution with minimal overhead. Because of these operations, the proposed model outperforms state-of-the-art models in heterogeneous cloud configurations and across multiple task types. The proposed model outperformed existing models in terms of make-span, deadline hit ratio, 9.2% lower computational cycles, 4.9% lower energy consumption, and 5.4% lower computational complexity, making it suitable for large-scale, real-time task scheduling.

BMTVDS2: a novel hybrid bioinspired model for task-and-VM-dependency and deadline aware scheduling via dual service level agreements

Article

Full-text available

Jul 2023

This paper discusses the design of a novel hybrid bioinspired model for task-and-VM-dependency and deadline aware scheduling via dual service level agreements. The model uses a combination of grey wolf optimization with the league championship algorithm, to perform efficient scheduling operations. These optimization techniques model a fitness function that incorporates task make-span, task deadline, mutual dependencies with other tasks, the capacity of VMs, and energy needed for scheduling operations. This assists in improving its scheduling performance for multiple use cases. To perform these tasks, the model initially deploys a task-based service level agreement (SLA) method, which assists in enhancing task and requesting-user diversity. This is followed by the design of a VM-based SLA model, which reconfigures the VM's internal characteristics to incorporate multiple task types. The model also integrates deadline awareness along with task-level and VM-level dependency awareness, which assists in improving its scheduling performance under real-time task and cloud scenarios. The proposed model is able to improve cloud utilization by 8.5%, increase task diversity by 8.3%, reduce the delay needed for resource provisioning by 16.5%, and reduce energy consumption by 9.1%, making for a wide variety of real-time cloud deployments.

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Preprint

Full-text available

Jul 2022

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is $27\times$ faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and $1.34\times$ faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is $2.8\times$ and $3.2\times$ than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems.

SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration

Conference Paper

Apr 2024

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

Conference Paper

Apr 2024

Computing of Neuromorphic Materials: An Emerging Approach for Bioengineering Solutions

Article

Full-text available

Oct 2023

The potential of neuromorphic computing to bring about revolutionary advancements in multiple disciplines, such as artificial intelligence (AI), robotics, neurology, and cognitive science, is well recognised. This paper presents a comprehensive survey of current advancements in the use of machine learning techniques for the logical development of neuromorphic materials for engineering solutions. The amalgamation of neuromorphic technology and material design possesses the potential to fundamentally revolutionise the procedure of material exploration, optimise material architectures at the atomic or molecular level, foster self-adaptive materials, augment energy efficiency, and enhance the efficacy of brain–machine interfaces (BMIs). Consequently, it has the potential to bring about a paradigm shift in various sectors and generate innovative prospects within the fields of material science and engineering. The objective of this study is to advance the field of artificial intelligence (AI) by creating hardware for neural networks that is energy-efficient. Additionally, the research attempts to improve neuron models, learning algorithms, and learning rules. The ultimate goal is to bring about a transformative impact on AI and better the overall efficiency of computer systems.

Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems

Conference Paper

Full-text available

Apr 2023

A Task Scheduling Algorithm for Optimizing Quality of Service in Smart Healthcare System

Chapter

Apr 2023

The idea of smart healthcare has progressively gained traction as a result of advancements in information technology. It is now possible to provide more efficient, convenient, and customized health care via the use of modern information technologies like the Internet of Things (loT), big data, cloud computing, and artificial intelligence. Cloud computing is rapidly emerging as the most exciting and transformative technology in today’s society. It is a highly virtualized platform that captivates the attention of a diverse user base in order to satisfy their resource needs. It has facilitated resource availability, making it even simpler to build services on these platforms. To maximize the benefits of cloud computing, the healthcare system is migrating its infrastructure to the cloud. It carries out activities or duties to fulfill the demand for these services. To successfully move health services to the cloud, it is critical to manage job scheduling. The article proposes a cloud-based architecture for health systems, which is based on the ant colony optimization algorithm and focuses on giving categorization for each user with varying levels to prioritize their work while arranging the jobs in the task queue. The job scheduling methods are implemented using the Java-based application cloudsim. Traditional methods are also applied in this article, and the outcome is recorded depending on a variety of factors. The suggested method is compared to more established algorithms such as First Come, First Serve (FCFS) and Round-Robin (RR). The collected findings demonstrate that the suggested method, which is based on ant colony optimization, outperforms the conventional algorithm in every way.KeywordsCloud computingHealth careAnt colony optimizationTask scheduling

DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing

Conference Paper

Feb 2023

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory Processing

Conference Paper

Jan 2023

HMC system architecture for PIM processing.

Contexts in source publication

Similar publications

Citations