Existing lock-free memory reclamation schemes

Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data Structures

Preprint

Full-text available

Feb 2023

Safe memory reclamation (SMR) algorithms are crucial for preventing use-after-free errors in optimistic data structures. SMR algorithms typically delay reclamation for safety and reclaim objects in batches for efficiency. It is difficult to strike a balance between performance and space efficiency. Small batch sizes and frequent reclamation attempts lead to high overhead, while freeing large batches can lead to long program interruptions and high memory footprints. An ideal SMR algorithm would forgo batching, and reclaim memory immediately, without suffering high reclamation overheads. To this end, we propose Conditional Access: a set of hardware instructions that offer immediate reclamation and low overhead in optimistic data structures. Conditional Access harnesses cache coherence to enable threads to efficiently detect potential use-after-free errors without explicit shared memory communication, and without introducing additional coherence traffic. We implement and evaluate Conditional Access in Graphite, a multicore simulator. Our experiments show that Conditional Access can rival the performance of highly optimized and carefully tuned SMR algorithms while simultaneously allowing immediate reclamation. This results in concurrent data structures with similar memory footprints to their sequential counterparts.

Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free Programs

Preprint

Full-text available

Feb 2023

Lock-free data structures are an important tool for the development of concurrent programs as they provide scalability, low latency and avoid deadlocks, livelocks and priority inversion. However, they require some sort of additional support to guarantee memory reclamation. The Optimistic Access (OA) method has most of the desired properties for memory reclamation, but since it allows memory to be accessed after being reclaimed, it is incompatible with the traditional memory management model. This renders it unable to release memory to the memory allocator/operating system, and, as such, it requires a complex memory recycling mechanism. In this paper, we extend the lock-free general purpose memory allocator LRMalloc to support the OA method. By doing so, we are able to simplify the memory reclamation method implementation and also allow memory to be reused by other parts of the same process. We further exploit the virtual memory system provided by the operating system and hardware in order to make it possible to release reclaimed memory to the operating system.

The ERA Theorem for Safe Memory Reclamation

Preprint

Full-text available

Nov 2022

Safe memory reclamation (SMR) schemes for concurrent data structures offer trade-offs between three desirable properties: ease of integration, robustness, and applicability. In this paper we rigorously define SMR and these three properties, and we present the ERA theorem, asserting that any SMR scheme can only provide at most two of the three properties.

wCQ: A Fast Wait-Free Queue with Bounded Memory Usage

Preprint

Jan 2022

The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such as LCRQ, trade memory usage for better performance. The recently proposed SCQ design achieves both memory efficiency as well as excellent performance. Unfortunately, both LCRQ and SCQ are only lock-free. On the other hand, existing wait-free queues are either not very performant or suffer from potentially unbounded memory usage. Strictly described, the latter queues, such as Yang & Mellor-Crummey's (YMC) queue, forfeit wait-freedom as they are blocking when memory is exhausted. We present a wait-free queue, called wCQ. wCQ is based on SCQ and uses its own variation of fast-path-slow-path methodology to attain wait-freedom and bound memory usage. Our experimental studies on x86 and PowerPC architectures validate wCQ's great performance and memory efficiency. They also show that wCQ's performance is often on par with the best known concurrent queue designs.

Crystalline: Fast and Memory Efficient Wait-Free Reclamation

Preprint

Aug 2021

Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and suffers from suboptimal performance in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline's high performance and high memory efficiency. Simultaneously ensuring all three properties require overcoming unique challenges which we discuss in the paper. Crystalline's implementation relies on specialized instructions which are widely available on commodity hardware such as x86-64 or ARM64. Our experimental evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than typical reclamation schemes such as EBR as the number of threads grows.

NBR: Neutralization Based Reclamation

Preprint

Full-text available

Dec 2020

Safe memory reclamation (SMR) algorithms suffer from a trade-off between bounding unreclaimed memory and the speed of reclamation. Hazard pointer (HP) based algorithms bound unreclaimed memory at all times, but tend to be slower than other approaches. Epoch based reclamation (EBR) algorithms are faster, but do not bound memory reclamation. Other algorithms follow hybrid approaches, requiring special compiler or hardware support, changes to record layouts, and/or extensive code changes. Not all SMR algorithms can be used to reclaim memory for all data structures. We propose a new neutralization based reclamation (NBR) algorithm that is faster than the best known EBR algorithms and achieves bounded unreclaimed memory. It is non-blocking when used with a non-blocking operating system (OS) kernel, and only requires atomic read, write and CAS. NBR is straightforward to use with many different data structures, and in most cases, require similar reasoning and programmer effort to two-phased locking. NBR is implemented using OS signals and a lightweight handshaking mechanism between participating threads to determine when it is safe to reclaim a record. Experiments on a lock-based binary search tree and a lazy linked list show that NBR significantly outperforms many state of the art reclamation algorithms. In the tree NBR is faster than next best algorithm, DEBRA by upto 38% and HP by upto 17%. And, in the list NBR is 15% and 243% faster than DEBRA and HP, respectively.

Universal Wait-Free Memory Reclamation

Preprint

Jan 2020

In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide similar guarantees for certain data structures, they lack support for arbitrary wait-free data structures. Consequently, developers are typically forced to marry their wait-free data structures with lock-free Hazard Pointers or (potentially blocking) epoch-based memory reclamation. Since both these schemes provide weaker progress guarantees, they essentially forfeit the strong progress guarantee of wait-free data structures. Though making the original Hazard Pointers scheme or epoch-based reclamation completely wait-free seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard Eras scheme, which we extend to guarantee wait-freedom. As this extension is non-trivial, we discuss all challenges pertaining to the construction of universal wait-free memory reclamation. WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures. Its API is mostly compatible with Hazard Pointers, which allows easy transitioning of existing data structures into WFE. Our experimental evaluations show that WFE's performance is close to epoch-based reclamation and almost matches the original Hazard Eras scheme, while providing the stronger wait-free progress guarantee.

Pointer life cycle types for lock-free data structures with memory reclamation

Article

Full-text available

Dec 2019

We consider the verification of lock-free data structures that manually manage their memory with the help of a safe memory reclamation (SMR) algorithm. Our first contribution is a type system that checks whether a program properly manages its memory. If the type check succeeds, it is safe to ignore the SMR algorithm and consider the program under garbage collection. Intuitively, our types track the protection of pointers as guaranteed by the SMR algorithm. There are two design decisions. The type system does not track any shape information, which makes it extremely lightweight. Instead, we rely on invariant annotations that postulate a protection by the SMR. To this end, we introduce angels, ghost variables with an angelic semantics. Moreover, the SMR algorithm is not hard-coded but a parameter of the type system definition. To achieve this, we rely on a recent specification language for SMR algorithms. Our second contribution is to automate the type inference and the invariant check. For the type inference, we show a quadratic-time algorithm. For the invariant check, we give a source-to-source translation that links our programs to off-the-shelf verification tools. It compiles away the angelic semantics. This allows us to infer appropriate annotations automatically in a guess-and-check manner. To demonstrate the effectiveness of our type-based verification approach, we check linearizability for various list and set implementations from the literature with both hazard pointers and epoch-based memory reclamation. For many of the examples, this is the first time they are verified automatically. For the ones where there is a competitor, we obtain a speed-up of up to two orders of magnitude.

Pointer Life Cycle Types for Lock-Free Data Structures with Memory Reclamation

Preprint

Oct 2019

We consider the verification of lock-free data structures that manually manage their memory with the help of a safe memory reclamation (SMR) algorithm. Our first contribution is a type system that checks whether a program properly manages its memory. If the type check succeeds, it is safe to ignore the SMR algorithm and consider the program under garbage collection. Intuitively, our types track the protection of pointers as guaranteed by the SMR algorithm. There are two design decisions. The type system does not track any shape information, which makes it extremely lightweight. Instead, we rely on invariant annotations that postulate a protection by the SMR. To this end, we introduce angels, ghost variables with an angelic semantics. Moreover, the SMR algorithm is not hard-coded but a parameter of the type system definition. To achieve this, we rely on a recent specification language for SMR algorithms. Our second contribution is to automate the type inference and the invariant check. For the type inference, we show a quadratic-time algorithm. For the invariant check, we give a source-to-source translation that links our programs to off-the-shelf verification tools. It compiles away the angelic semantics. This allows us to infer appropriate annotations automatically in a guess-and-check manner. To demonstrate the effectiveness of our type-based verification approach, we check linearizability for various list and set implementations from the literature with both hazard pointers and epoch-based memory reclamation. For many of the examples, this is the first time they are verified automatically. For the ones where there is a competitor, we obtain a speed-up of up to two orders of magnitude.

Efficient lock-free durable sets

Article

Full-text available

Oct 2019

Non-volatile memory is expected to co-exist or replace DRAM in upcoming architectures. Durable concurrent data structures for non-volatile memories are essential building blocks for constructing adequate software for use with these architectures. In this paper, we propose a new approach for durable concurrent sets and use this approach to build the most efficient durable hash tables available today. Evaluation shows a performance improvement factor of up to 3.3x over existing technology.

Existing lock-free memory reclamation schemes

Context in source publication

Citations