Mechanism of cache memory systems.

Source publication

FIGURE 6. Architecture of LLC that consist of LLC slice connected via...

FIGURE 9. System model of proposed architecture with on-chip logical...

FIGURE 10. Example of system model of proposed architecture where Csys...

FIGURE 11. System model of reference architecture. (a) Architecture...

Packet Processing Architecture Using Last-Level-Cache Slices and Interleaved 3D-Stacked DRAM

Article

Full-text available

Mar 2020

Packet processing performance of Network Function Virtualization (NFV)-aware environment depends on the memory access performance of commercial-off-the-shelf (COTS) hardware systems. Table lookup is a typical example of packet processing, which has a significant dependence on memory access performance. Thus, the on-chip cache memories of the CPU ar...

Context 1

... the L1 cache is the fastest and has the smallest memory capacity; on the other hand, LLC is the slowest with the largest memory capacity. Figure 5 shows the mechanism of a cache memory system including off-chip memory devices. A cache line is an elementary block of data transferred between the cache and the off-chip memory. ...

View in full-text

Context 2

... cache line is an elementary block of data transferred between the cache and the off-chip memory. Usually, the data physically around a particular data, shown as the target data in Figure 5, is likely to be accessed next, which is known as data spatial locality of the data. By assuming the space locality of data, cache lines improve the hit probabilities of cache memories. ...

View in full-text

A Design Space Exploration and Evaluation for Main-Memory Hash Joins in Storage Class Memory

Article

Full-text available

Apr 2023

In this paper, we seek to perform a rigorous experimental study of main-memory hash joins in storage class memory (SCM). In particular, we perform a design space exploration in real SCM for two state-of-the-art join algorithms: partitioned hash join (PHJ) and non-partitioned hash join (NPHJ), and identify the most crucial factors to implement an SCM-friendly join. Moreover, we present a rigorous evaluation with a broad spectrum of workloads for both joins and provide an in-depth analysis for choosing the most suitable algorithm in real SCM environment. With the most extensive experimental analysis up-to-date, we maintain that although there is no one universal winner in all scenarios, PHJ is generally superior to NPHJ in real SCM.

Vector In Memory Architecture for simple and high efficiency computing

Preprint

Full-text available

Mar 2022

Data movement is one of the main challenges of contemporary system architectures. Near-Data Processing (NDP) mitigates this issue by moving computation closer to the memory, avoiding excessive data movement. Our proposal, Vector-In-Memory Architecture(VIMA), executes large vector instructions near 3D-stacked memories using vector functional units and uses a small data cache to enable short-term data reuse. It provides an easy programming interface and guarantees precise exceptions. When executing stream-behaved applications using a single core, VIMA offers a speedup of up to 26x over a CPU system baseline with vector operations in a single-core processor while spending 93% less energy.

Operating Systems and Hypervisors for Network Functions: A Survey of Enabling Technologies and Research Studies

Article

Full-text available

Jan 2022

Scalable and flexible communication networks increasingly conduct the packet processing for Network Functions (NFs) in General Purpose Computing (GPC) platforms. The input/output (I/O)-intensive and latency-sensitive packet processing is challenging for the operating systems and hypervisors running on GPC platforms. This article surveys the existing enabling technologies and research studies on operating system and hypervisor aspects that directly influence the packet processing for NFs on GPC platforms. We organize this survey according to the main categories abstraction approach, memory access, and I/O strategy. We further categorize abstraction approach technologies and research studies into the categories operation systems, hypervisors, and containers. We partition the memory access category into the two sub-categories of memory allocation and memory access, while we partition the I/O strategy category into the sub-categories I/O device virtualization and I/O device access. Our survey gives a comprehensive summary of the capabilities and limitations of the existing enabling technologies and researched approaches for abstraction, memory access, and I/O for NF packet processing. We outline critical future research directions for advancing NF packet processing on GPC platforms.

Memory Network Architecture for Packet Processing in Functions Virtualization

Conference Paper

Jun 2021

Characterization and Mitigation of Electromigration Effects in TSV-Based Power Delivery Network Enabled 3D-Stacked DRAMs

Preprint

Jun 2021

With 3D-stacked DRAM architectures becoming more prevalent, it has become important to find ways to characterize and mitigate the adverse effects that can hinder their inherent access parallelism and throughput. One example of such adversities is the electromigration (EM) effects in the through-silicon vias (TSVs) of the power delivery network (PDN) of 3D-stacked DRAM architectures. Several prior works have addressed the effects of EM in TSVs of 3D integrated circuits. However, no prior work has addressed the effects of EM in the PDN TSVs on the performance and lifetime of 3D-stacked DRAMs. In this paper, we characterize the effects of EM in PDN TSVs on a Hybrid Memory Cube (HMC) architecture employing the conventional PDN design with clustered layout of power and ground TSVs. We then present a new PDN design with a distributed layout of power and ground TSVs and show that it can mitigate the adverse effects of EM on the HMC architecture performance without requiring additional power and ground pins. Our benchmark-driven simulation-based analysis shows that compared to the clustered PDN layout, our proposed distributed PDN layout improves the EM-affected lifetime of the HMC architecture by up to 10 years. During this useful lifetime, the HMC architecture yields up to 1.51 times less energy-delay product (EDP).

Unavailability-Aware Shared Virtual Backup Allocation for Middleboxes: A Queueing Approach

Article

Sep 2020

Network function virtualization provides an efficient and flexible way to implement network functions deployed in middleboxes as software running on commodity servers. However, it brings challenges for network management, one of which is how to manage the unavailability of middleboxes. This paper proposes an unavailability-aware backup allocation model with the shared protection to minimize the maximum unavailability among functions. The shared protection allows multiple functions to share the backup resources, which leads to a complicated recovery mechanism and makes unavailability estimation difficult. We develop an analytical approach based on the queueing theory to compute the middlebox unavailability for a given backup allocation. The heterogeneous failure, repair, recovery, and waiting procedures of functions and backup servers, which lead to several different states for each function and for the whole system, are considered in the queueing approach. We analyze the performance bounds for a given solution and for the optimal objective value. Based on the developed analytical approach and the performance bounds, we introduce two heuristics to solve the backup allocation problem. The results reveal that, compared to a baseline model, the proposed unavailability-aware model reduces the maximum unavailability 16% in average in our examined scenarios.

Mechanism of cache memory systems.

Contexts in source publication

Citations