HMC system architecture for PIM processing.

HMC system architecture for PIM processing.

Source publication
Article
Full-text available
Processing-in-memory (PIM) comprises computational logic in the memory domain. It is the most promising solution to alleviate the memory bandwidth problem in deep neural network (DNN) processing. The hybrid memory cube (HMC), a 3D stacked memory structure, can efficiently implement the PIM architecture by maximizing the existing legacy hardware. To...

Contexts in source publication

Context 1
... et al. proposed a Tesseract architecture to maximize the available memory bandwidth in multiple HMC-based PIM structures for graph processing [5]. Tesseract is still widely used as the base architecture for HMC-based PIM, and the system architecture is presented in Fig. 1. In Tesseract, the network between the PEs of the HMC and the I/O links is configured in a fully connected topology. Each PE has an integrated in-order core for execution and a PF buffer (PFB) to prefetch communication traffic. Links between HMCs are connected in a dragonfly topology. The proposed method focuses on the HMC ...
Context 2
... intermediate result transfer schedule between datadependent tasks is affected by the end time of the upstream task and the start time of the downstream task. Intermediate results can be stored in PFB or large-capacity DRAM presented in Fig. 1, varying access latency accordingly. This latency variance is intensified by the packet-based communication structure in the multi-HMC based PIM system. The concept of retiming is applied to solve the problem of context switching and latency hiding accordingly. First, the terms VOLUME 9, 2021 iteration, prologue, and retiming to ...

Similar publications

Article
Full-text available
The design of a lightweight deep learning model would be an ideal solution for overcoming resource limitations when implementing artificial intelligence in edge sites. In this study, we propose a lightweight deep neural network that uses a Mixer-type architecture based on nonlinear vector autoregression (NVAR), which we refer to as Mixer-type NVAR....

Citations

... Drawing from these considerations, researchers have developed different mapping models for provisioning resources to multiple task types. The next section of this text provides a survey of deployment-specific details, applicationspecific benefits, functional restrictions, and potential future applications for these models [2][3] [4]. This review reveals that current provisioning models are either incomplete or excessively complex, thus restricting their scalability. ...
Article
Cloud resource provisioning requires examining tasks, dependencies, deadlines, and capacity distribution. Scalability is hindered by incomplete or complex models. Comprehensive models with low-to-moderate QoS are unsuitable for real-time scenarios. This research proposes a Negotiation Aware SLA Model for Resource Provisioning in cloud deployments to address these challenges. In the proposed model, a task-level SLA maximizes resource allocation fairness and incorporates task dependency for correlated task types. This process's new tasks are processed by an efficient hierarchical task clustering process. Priority is assigned to each task. For efficient provisioning, an Elephant Herding Optimization (EHO) model allocates resources to these clusters based on task deadline and make-span levels. The EHO Model suggests a fitness function that shortens the make-span and raises deadline awareness. Q-Learning is used in the VM-aware negotiation framework for capacity tuning and task-shifting to post-process allocated tasks for faster task execution with minimal overhead. Because of these operations, the proposed model outperforms state-of-the-art models in heterogeneous cloud configurations and across multiple task types. The proposed model outperformed existing models in terms of make-span, deadline hit ratio, 9.2% lower computational cycles, 4.9% lower energy consumption, and 5.4% lower computational complexity, making it suitable for large-scale, real-time task scheduling.
... This model is capable of predicting task patterns, which assists in improving capacity pre-emption for different VM types. Similar models are discussed in [8][9][10], which propose the use of joint task scheduling and containerizing (JTSC), genetic algorithm with mobility aware task scheduling (GAMTS), and deep neural network scheduling (DNNS), for estimation of multiple task types under real-time environments. These models are useful for deploying scheduling techniques for large-scale use cases. ...
... By taking into account additional performance metrics, utilizing standardized benchmark datasets, and contrasting it against a wider range of existing models, future research could concentrate on providing a more thorough evaluation of the proposed model. Limited evaluation on a small-scale testbed [4][5][6] Developed a genetic algorithm for task allocation in cloud environments Assumed independent tasks without considering interdependencies [7,8] Introduced a machine learning-based approach for VM scheduling Focused primarily on VM allocation rather than task dependencies [9][10][11] Proposed a task clustering method for efficient VM scheduling ...
Article
Full-text available
This paper discusses the design of a novel hybrid bioinspired model for task-and-VM-dependency and deadline aware scheduling via dual service level agreements. The model uses a combination of grey wolf optimization with the league championship algorithm, to perform efficient scheduling operations. These optimization techniques model a fitness function that incorporates task make-span, task deadline, mutual dependencies with other tasks, the capacity of VMs, and energy needed for scheduling operations. This assists in improving its scheduling performance for multiple use cases. To perform these tasks, the model initially deploys a task-based service level agreement (SLA) method, which assists in enhancing task and requesting-user diversity. This is followed by the design of a VM-based SLA model, which reconfigures the VM's internal characteristics to incorporate multiple task types. The model also integrates deadline awareness along with task-level and VM-level dependency awareness, which assists in improving its scheduling performance under real-time task and cloud scenarios. The proposed model is able to improve cloud utilization by 8.5%, increase task diversity by 8.3%, reduce the delay needed for resource provisioning by 16.5%, and reduce energy consumption by 9.1%, making for a wide variety of real-time cloud deployments.
... A large corpus of prior works focus on accelerating DL inference using different PIM solutions. This includes both proposals from Academia [47,73,98,136,150,151,[239][240][241][242][243][244][245][246] and Industry [166][167][168][169][170], targeting various types of DL models, including convolutional neural networks [47,73,98,136,150,151,166,167,239,[241][242][243][244], recurrent neural networks [136,169,246], and recommendation systems [168,170,240,245]. Our work differs from such works since we focus on classic ML algorithms (i.e., regression, classification, clustering) using a real-world generalpurpose PIM architecture (i.e., the commercially-available UPMEM PIM architecture [161]). ...
... A large corpus of prior works focus on accelerating DL inference using different PIM solutions. This includes both proposals from Academia [47,73,98,136,150,151,[239][240][241][242][243][244][245][246] and Industry [166][167][168][169][170], targeting various types of DL models, including convolutional neural networks [47,73,98,136,150,151,166,167,239,[241][242][243][244], recurrent neural networks [136,169,246], and recommendation systems [168,170,240,245]. Our work differs from such works since we focus on classic ML algorithms (i.e., regression, classification, clustering) using a real-world generalpurpose PIM architecture (i.e., the commercially-available UPMEM PIM architecture [161]). ...
Preprint
Full-text available
Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is $27\times$ faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and $1.34\times$ faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is $2.8\times$ and $3.2\times$ than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems.
Article
Full-text available
The potential of neuromorphic computing to bring about revolutionary advancements in multiple disciplines, such as artificial intelligence (AI), robotics, neurology, and cognitive science, is well recognised. This paper presents a comprehensive survey of current advancements in the use of machine learning techniques for the logical development of neuromorphic materials for engineering solutions. The amalgamation of neuromorphic technology and material design possesses the potential to fundamentally revolutionise the procedure of material exploration, optimise material architectures at the atomic or molecular level, foster self-adaptive materials, augment energy efficiency, and enhance the efficacy of brain–machine interfaces (BMIs). Consequently, it has the potential to bring about a paradigm shift in various sectors and generate innovative prospects within the fields of material science and engineering. The objective of this study is to advance the field of artificial intelligence (AI) by creating hardware for neural networks that is energy-efficient. Additionally, the research attempts to improve neuron models, learning algorithms, and learning rules. The ultimate goal is to bring about a transformative impact on AI and better the overall efficiency of computer systems.
Chapter
The idea of smart healthcare has progressively gained traction as a result of advancements in information technology. It is now possible to provide more efficient, convenient, and customized health care via the use of modern information technologies like the Internet of Things (loT), big data, cloud computing, and artificial intelligence. Cloud computing is rapidly emerging as the most exciting and transformative technology in today’s society. It is a highly virtualized platform that captivates the attention of a diverse user base in order to satisfy their resource needs. It has facilitated resource availability, making it even simpler to build services on these platforms. To maximize the benefits of cloud computing, the healthcare system is migrating its infrastructure to the cloud. It carries out activities or duties to fulfill the demand for these services. To successfully move health services to the cloud, it is critical to manage job scheduling. The article proposes a cloud-based architecture for health systems, which is based on the ant colony optimization algorithm and focuses on giving categorization for each user with varying levels to prioritize their work while arranging the jobs in the task queue. The job scheduling methods are implemented using the Java-based application cloudsim. Traditional methods are also applied in this article, and the outcome is recorded depending on a variety of factors. The suggested method is compared to more established algorithms such as First Come, First Serve (FCFS) and Round-Robin (RR). The collected findings demonstrate that the suggested method, which is based on ant colony optimization, outperforms the conventional algorithm in every way.KeywordsCloud computingHealth careAnt colony optimizationTask scheduling