Mechanism of cache memory systems.

Mechanism of cache memory systems.

Source publication
Article
Full-text available
Packet processing performance of Network Function Virtualization (NFV)-aware environment depends on the memory access performance of commercial-off-the-shelf (COTS) hardware systems. Table lookup is a typical example of packet processing, which has a significant dependence on memory access performance. Thus, the on-chip cache memories of the CPU ar...

Contexts in source publication

Context 1
... the L1 cache is the fastest and has the smallest memory capacity; on the other hand, LLC is the slowest with the largest memory capacity. Figure 5 shows the mechanism of a cache memory system including off-chip memory devices. A cache line is an elementary block of data transferred between the cache and the off-chip memory. ...
Context 2
... cache line is an elementary block of data transferred between the cache and the off-chip memory. Usually, the data physically around a particular data, shown as the target data in Figure 5, is likely to be accessed next, which is known as data spatial locality of the data. By assuming the space locality of data, cache lines improve the hit probabilities of cache memories. ...

Citations

... This phenomenon is primarily due to the limited capacity of SCM on-DIMM buffer [P 2 ]. A prefetching distance of 2 14 -tuple indicates that both prefetched tuples and hash buckets requires 1MB memory region (each prefetches 2 14 cachelines), which consumes 2MB space in total and exceeds the last level cache (LLC) size per core (LLC slice size [54]) 9 . The prefetched buckets and tuples, therefore, can no longer be buffered in LLC, rendering excessive repeated memory accesses. ...
Article
Full-text available
In this paper, we seek to perform a rigorous experimental study of main-memory hash joins in storage class memory (SCM). In particular, we perform a design space exploration in real SCM for two state-of-the-art join algorithms: partitioned hash join (PHJ) and non-partitioned hash join (NPHJ), and identify the most crucial factors to implement an SCM-friendly join. Moreover, we present a rigorous evaluation with a broad spectrum of workloads for both joins and provide an in-depth analysis for choosing the most suitable algorithm in real SCM environment. With the most extensive experimental analysis up-to-date, we maintain that although there is no one universal winner in all scenarios, PHJ is generally superior to NPHJ in real SCM.
... The authors aimed to reduce data traffic during some of the more memoryintensive portions of graphics processing algorithms. Korikawa et al. [31] used PIM in Network Function Virtualization (NFV) environments to speed up packet processing by leveraging bank interleaving and channel parallelism of 3D-stacked memories. ...
Preprint
Full-text available
Data movement is one of the main challenges of contemporary system architectures. Near-Data Processing (NDP) mitigates this issue by moving computation closer to the memory, avoiding excessive data movement. Our proposal, Vector-In-Memory Architecture(VIMA), executes large vector instructions near 3D-stacked memories using vector functional units and uses a small data cache to enable short-term data reuse. It provides an easy programming interface and guarantees precise exceptions. When executing stream-behaved applications using a single core, VIMA offers a speedup of up to 26x over a CPU system baseline with vector operations in a single-core processor while spending 93% less energy.
... Also, future memory access mechanisms should efficiently exploit the recent and emerging memory architectures for packet processing applications [241], [242]. The specific access characteristics of NUMA [243]- [247] should be carefully accounted for in the design and optimization of NF placement and scheduling on GPC nodes in NFV systems. ...
Article
Full-text available
Scalable and flexible communication networks increasingly conduct the packet processing for Network Functions (NFs) in General Purpose Computing (GPC) platforms. The input/output (I/O)-intensive and latency-sensitive packet processing is challenging for the operating systems and hypervisors running on GPC platforms. This article surveys the existing enabling technologies and research studies on operating system and hypervisor aspects that directly influence the packet processing for NFs on GPC platforms. We organize this survey according to the main categories abstraction approach, memory access, and I/O strategy. We further categorize abstraction approach technologies and research studies into the categories operation systems, hypervisors, and containers. We partition the memory access category into the two sub-categories of memory allocation and memory access, while we partition the I/O strategy category into the sub-categories I/O device virtualization and I/O device access. Our survey gives a comprehensive summary of the capabilities and limitations of the existing enabling technologies and researched approaches for abstraction, memory access, and I/O for NF packet processing. We outline critical future research directions for advancing NF packet processing on GPC platforms.
Preprint
With 3D-stacked DRAM architectures becoming more prevalent, it has become important to find ways to characterize and mitigate the adverse effects that can hinder their inherent access parallelism and throughput. One example of such adversities is the electromigration (EM) effects in the through-silicon vias (TSVs) of the power delivery network (PDN) of 3D-stacked DRAM architectures. Several prior works have addressed the effects of EM in TSVs of 3D integrated circuits. However, no prior work has addressed the effects of EM in the PDN TSVs on the performance and lifetime of 3D-stacked DRAMs. In this paper, we characterize the effects of EM in PDN TSVs on a Hybrid Memory Cube (HMC) architecture employing the conventional PDN design with clustered layout of power and ground TSVs. We then present a new PDN design with a distributed layout of power and ground TSVs and show that it can mitigate the adverse effects of EM on the HMC architecture performance without requiring additional power and ground pins. Our benchmark-driven simulation-based analysis shows that compared to the clustered PDN layout, our proposed distributed PDN layout improves the EM-affected lifetime of the HMC architecture by up to 10 years. During this useful lifetime, the HMC architecture yields up to 1.51 times less energy-delay product (EDP).
Article
Network function virtualization provides an efficient and flexible way to implement network functions deployed in middleboxes as software running on commodity servers. However, it brings challenges for network management, one of which is how to manage the unavailability of middleboxes. This paper proposes an unavailability-aware backup allocation model with the shared protection to minimize the maximum unavailability among functions. The shared protection allows multiple functions to share the backup resources, which leads to a complicated recovery mechanism and makes unavailability estimation difficult. We develop an analytical approach based on the queueing theory to compute the middlebox unavailability for a given backup allocation. The heterogeneous failure, repair, recovery, and waiting procedures of functions and backup servers, which lead to several different states for each function and for the whole system, are considered in the queueing approach. We analyze the performance bounds for a given solution and for the optimal objective value. Based on the developed analytical approach and the performance bounds, we introduce two heuristics to solve the backup allocation problem. The results reveal that, compared to a baseline model, the proposed unavailability-aware model reduces the maximum unavailability 16% in average in our examined scenarios.