Existing lock-free memory reclamation schemes

Existing lock-free memory reclamation schemes

Source publication
Article
Full-text available
Memory-management support for lock-free data structures is well known to be a tough problem. Recent work has successfully reduced the overhead of such schemes. However, applying memory-management support to a data structure remains complex and, in many cases, requires redesigning the data structure. In this paper, we present the first lock-free mem...

Context in source publication

Context 1
... consider three aspects of each of these schemes: performance, non-blocking guarantees, and the simplicity of applying them to a given data structure. A table summarizing these schemes appears in Table 1. ...

Citations

... Current safe memory reclamation (SMR) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13] algorithms used in many optimistic data structures delay reclamation and free nodes in batches to trade-off space in favor of high performance and safety. When the batches are too small, the data structure's throughput suffers due to more overhead from frequent reclamation. ...
... For example, in a linked list that sets a marked bit in a node before deleting it, if a thread reads the next pointer of an unmarked node, and at some later time its marked bit and next pointer have not been changed, then it is still safe to dereference its next pointer. In such a data structure, once a node is unlinked and marked, it can immediately be freed, since doing so will merely cause subsequent creads or cwrites on the node to fail, triggering a restart (an approach common to many popular SMRs [2], [7], [9], [17]). ...
... HBR [2], [26], [27] and RCBR [10], [26], [28], [8] can bound the number of unreclaimed nodes but are generally slow due to high per-read overhead or node instrumentation overhead. HYR techniques have achieved both speed and bounds on unreclaimed number of nodes with varying success, but require assumptions pertaining to specialised hardware or operating system and memory allocators [8], [29], [9], [12], [13], [4], [5], [3], [30], [6], [31], [17], [7], [32], [33], [15], [34], [35], [36]. Nevertheless, these techniques still prefer to reduce the reclamation algorithm's overhead by delaying reclamation using batches that increase the memory footprint. ...
Preprint
Full-text available
Safe memory reclamation (SMR) algorithms are crucial for preventing use-after-free errors in optimistic data structures. SMR algorithms typically delay reclamation for safety and reclaim objects in batches for efficiency. It is difficult to strike a balance between performance and space efficiency. Small batch sizes and frequent reclamation attempts lead to high overhead, while freeing large batches can lead to long program interruptions and high memory footprints. An ideal SMR algorithm would forgo batching, and reclaim memory immediately, without suffering high reclamation overheads. To this end, we propose Conditional Access: a set of hardware instructions that offer immediate reclamation and low overhead in optimistic data structures. Conditional Access harnesses cache coherence to enable threads to efficiently detect potential use-after-free errors without explicit shared memory communication, and without introducing additional coherence traffic. We implement and evaluate Conditional Access in Graphite, a multicore simulator. Our experiments show that Conditional Access can rival the performance of highly optimized and carefully tuned SMR algorithms while simultaneously allowing immediate reclamation. This results in concurrent data structures with similar memory footprints to their sequential counterparts.
... One such example is the Automatic Optimistic Access (AOA) method [4], which allows the data structure programmer to forego the retire call by making use of garbage collector like techniques. A second example is the Free Access (FA) method [3] that requires the programmer to annotate the data structure functions, which then, through a combination of garbage collection techniques and compiler steps, is able to apply OA like memory reclamation to the data structure without the need for it be written in a normalized form [22]. Another example is the VBR method [21] that is able to extend OA to write operations through the use of DWCAS (Double-Width Compare-and-Swap) with tagged pointers. ...
Preprint
Full-text available
Lock-free data structures are an important tool for the development of concurrent programs as they provide scalability, low latency and avoid deadlocks, livelocks and priority inversion. However, they require some sort of additional support to guarantee memory reclamation. The Optimistic Access (OA) method has most of the desired properties for memory reclamation, but since it allows memory to be accessed after being reclaimed, it is incompatible with the traditional memory management model. This renders it unable to release memory to the memory allocator/operating system, and, as such, it requires a complex memory recycling mechanism. In this paper, we extend the lock-free general purpose memory allocator LRMalloc to support the OA method. By doing so, we are able to simplify the memory reclamation method implementation and also allow memory to be reused by other parts of the same process. We further exploit the virtual memory system provided by the operating system and hardware in order to make it possible to release reclaimed memory to the operating system.
... E.g., the Automatic Optimistic Access (AOA) scheme [10] requires the data structure to be in a normalized form [42] for the integration. Free Access (FA) [9] and Neutralization-Based Reclamation (NBR) [39] require that the code is divided into separate read and write phases, and Version-Based Reclamation (VBR) [37] provides a designated mechanism for adding code checkpoints. Note that wide applicability and easy integration are independent properties. ...
Preprint
Full-text available
Safe memory reclamation (SMR) schemes for concurrent data structures offer trade-offs between three desirable properties: ease of integration, robustness, and applicability. In this paper we rigorously define SMR and these three properties, and we present the ERA theorem, asserting that any SMR scheme can only provide at most two of the three properties.
... Furthermore, to our knowledge, there is no wait-free garbage collector with limited overheads and bounded memory usage. FreeAccess [3] and OrcGC [4] are e cient but they are only lock-free. ...
Preprint
The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such as LCRQ, trade memory usage for better performance. The recently proposed SCQ design achieves both memory efficiency as well as excellent performance. Unfortunately, both LCRQ and SCQ are only lock-free. On the other hand, existing wait-free queues are either not very performant or suffer from potentially unbounded memory usage. Strictly described, the latter queues, such as Yang & Mellor-Crummey's (YMC) queue, forfeit wait-freedom as they are blocking when memory is exhausted. We present a wait-free queue, called wCQ. wCQ is based on SCQ and uses its own variation of fast-path-slow-path methodology to attain wait-freedom and bound memory usage. Our experimental studies on x86 and PowerPC architectures validate wCQ's great performance and memory efficiency. They also show that wCQ's performance is often on par with the best known concurrent queue designs.
... Since non-blocking data structures do not use simple mutual exclusion, their memory management becomes highly challenging: a concurrent thread may hold an obsolete pointer to an object which is about to be freed by another thread. Responding to this serious challenge, safe memory reclamation (SMR) schemes for unmanaged C/C++ code have been proposed in literature (e.g., [7,11,15,25,30,31,34,42]). However, they typically involve major trade-o s, such as trading o memory e ciency for high throughput (or vice versa). ...
... Exacerbating the problem, only a few schemes [7,11,25,30,31,34] are truly non-blocking and with bounded memory usage, i.e., when a suspension of one thread has no adverse impact on progress in other threads. Furthermore, only one scheme [30] implements wait-freedom for a general case. ...
... To bound memory usage and improve usability, several techniques were developed that exploit OS support. As was pointed out in [7], these approaches are not strictly non-blocking, because typical OS primitives such as signals use locks internally (e.g., in Linux). ThreadScan [3] is one such mechanism which uses signals. ...
Preprint
Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and suffers from suboptimal performance in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline's high performance and high memory efficiency. Simultaneously ensuring all three properties require overcoming unique challenges which we discuss in the paper. Crystalline's implementation relies on specialized instructions which are widely available on commodity hardware such as x86-64 or ARM64. Our experimental evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than typical reclamation schemes such as EBR as the number of threads grows.
... Unfortunately, it doesn't appear that AOA has been ported to modern compilers. The need for a normalized form was eliminated in Free Access (FA) [12], which proposed another compiler extension to perform automatic instrumentation surrounding writes, and blocks of consecutive independent reads. FA is a general technique that has been shown to be comparable to HPBR [12] [opposing P1]. ...
... The need for a normalized form was eliminated in Free Access (FA) [12], which proposed another compiler extension to perform automatic instrumentation surrounding writes, and blocks of consecutive independent reads. FA is a general technique that has been shown to be comparable to HPBR [12] [opposing P1]. In contrast, our work targets applications that can benefit from the higher performance handcrafted SMR. ...
Preprint
Full-text available
Safe memory reclamation (SMR) algorithms suffer from a trade-off between bounding unreclaimed memory and the speed of reclamation. Hazard pointer (HP) based algorithms bound unreclaimed memory at all times, but tend to be slower than other approaches. Epoch based reclamation (EBR) algorithms are faster, but do not bound memory reclamation. Other algorithms follow hybrid approaches, requiring special compiler or hardware support, changes to record layouts, and/or extensive code changes. Not all SMR algorithms can be used to reclaim memory for all data structures. We propose a new neutralization based reclamation (NBR) algorithm that is faster than the best known EBR algorithms and achieves bounded unreclaimed memory. It is non-blocking when used with a non-blocking operating system (OS) kernel, and only requires atomic read, write and CAS. NBR is straightforward to use with many different data structures, and in most cases, require similar reasoning and programmer effort to two-phased locking. NBR is implemented using OS signals and a lightweight handshaking mechanism between participating threads to determine when it is safe to reclaim a record. Experiments on a lock-based binary search tree and a lazy linked list show that NBR significantly outperforms many state of the art reclamation algorithms. In the tree NBR is faster than next best algorithm, DEBRA by upto 38% and HP by upto 17%. And, in the list NBR is 15% and 243% faster than DEBRA and HP, respectively.
... Although a number of memory reclamation techniques [4-10, 16, 19-21, 27, 28, 30, 33, 37, 38] have been proposed, only a fraction of them can be used for arbitrary data structures and are truly non-blocking [9,27,28,30,33,37,38]. At present, no universal memory reclamation technique exists that guarantees wait-freedom for arbitrary wait-free data structures. ...
... " Automatic Optimistic Access [10] relies on a data structure-speci c garbage collector to make reclamation more automatic, but still requires data structures to be written in a normalized form. FreeAccess [9] forgoes this requirement by extending the LLVM compiler to make the process fully automatic. ...
Preprint
In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide similar guarantees for certain data structures, they lack support for arbitrary wait-free data structures. Consequently, developers are typically forced to marry their wait-free data structures with lock-free Hazard Pointers or (potentially blocking) epoch-based memory reclamation. Since both these schemes provide weaker progress guarantees, they essentially forfeit the strong progress guarantee of wait-free data structures. Though making the original Hazard Pointers scheme or epoch-based reclamation completely wait-free seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard Eras scheme, which we extend to guarantee wait-freedom. As this extension is non-trivial, we discuss all challenges pertaining to the construction of universal wait-free memory reclamation. WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures. Its API is mostly compatible with Hazard Pointers, which allows easy transitioning of existing data structures into WFE. Our experimental evaluations show that WFE's performance is close to epoch-based reclamation and almost matches the original Hazard Eras scheme, while providing the stronger wait-free progress guarantee.
... Drop the Anchor [Braginsky et al. 2013], Optimistic Access [Cohen and Petrank 2015b], Automatic Optimistic Access [Cohen and Petrank 2015a], QSense [Balmau et al. 2016], Hazard Eras [Ramalhete and Correia 2017], and Interval-based Reclamation[Wen et al. 2018] combine EBR and HP. Free Access[Cohen 2018] automates the application of Automatic Optimistic Access. While the method promises to be correct by construction, we believe that performance-critical applications choose the SMR technique based on performance rather than ease of use. ...
Article
Full-text available
We consider the verification of lock-free data structures that manually manage their memory with the help of a safe memory reclamation (SMR) algorithm. Our first contribution is a type system that checks whether a program properly manages its memory. If the type check succeeds, it is safe to ignore the SMR algorithm and consider the program under garbage collection. Intuitively, our types track the protection of pointers as guaranteed by the SMR algorithm. There are two design decisions. The type system does not track any shape information, which makes it extremely lightweight. Instead, we rely on invariant annotations that postulate a protection by the SMR. To this end, we introduce angels, ghost variables with an angelic semantics. Moreover, the SMR algorithm is not hard-coded but a parameter of the type system definition. To achieve this, we rely on a recent specification language for SMR algorithms. Our second contribution is to automate the type inference and the invariant check. For the type inference, we show a quadratic-time algorithm. For the invariant check, we give a source-to-source translation that links our programs to off-the-shelf verification tools. It compiles away the angelic semantics. This allows us to infer appropriate annotations automatically in a guess-and-check manner. To demonstrate the effectiveness of our type-based verification approach, we check linearizability for various list and set implementations from the literature with both hazard pointers and epoch-based memory reclamation. For many of the examples, this is the first time they are verified automatically. For the ones where there is a competitor, we obtain a speed-up of up to two orders of magnitude.
... Drop the Anchor [Braginsky et al. 2013], Optimistic Access [Cohen and Petrank 2015b], Automatic Optimistic Access [Cohen and Petrank 2015a], QSense [Balmau et al. 2016], Hazard Eras [Ramalhete and Correia 2017], and Interval-Based Reclamation [Wen et al. 2018] combine EBR and HP. Free Access [Cohen 2018] automates the application of Automatic Optimistic Access. While the method promises to be correct by construction, we believe that performance-critical applications choose the SMR technique based on performance rather than ease of use. ...
Preprint
We consider the verification of lock-free data structures that manually manage their memory with the help of a safe memory reclamation (SMR) algorithm. Our first contribution is a type system that checks whether a program properly manages its memory. If the type check succeeds, it is safe to ignore the SMR algorithm and consider the program under garbage collection. Intuitively, our types track the protection of pointers as guaranteed by the SMR algorithm. There are two design decisions. The type system does not track any shape information, which makes it extremely lightweight. Instead, we rely on invariant annotations that postulate a protection by the SMR. To this end, we introduce angels, ghost variables with an angelic semantics. Moreover, the SMR algorithm is not hard-coded but a parameter of the type system definition. To achieve this, we rely on a recent specification language for SMR algorithms. Our second contribution is to automate the type inference and the invariant check. For the type inference, we show a quadratic-time algorithm. For the invariant check, we give a source-to-source translation that links our programs to off-the-shelf verification tools. It compiles away the angelic semantics. This allows us to infer appropriate annotations automatically in a guess-and-check manner. To demonstrate the effectiveness of our type-based verification approach, we check linearizability for various list and set implementations from the literature with both hazard pointers and epoch-based memory reclamation. For many of the examples, this is the first time they are verified automatically. For the ones where there is a competitor, we obtain a speed-up of up to two orders of magnitude.
... To maintain the lock-freedom of our algorithms, lock-free memory reclamation schemes can be used (e.g., [Alistarh et al. 2017;Balmau et al. 2016;Brown 2015;Cohen 2018;Cohen and Petrank 2015;Dice et al. 2016;Michael 2004]). Some, however, are complicated to incorporate; some require the data structure to be in a normalized form; and others have significant overhead that commonly deteriorates performance. ...
Article
Full-text available
Non-volatile memory is expected to co-exist or replace DRAM in upcoming architectures. Durable concurrent data structures for non-volatile memories are essential building blocks for constructing adequate software for use with these architectures. In this paper, we propose a new approach for durable concurrent sets and use this approach to build the most efficient durable hash tables available today. Evaluation shows a performance improvement factor of up to 3.3x over existing technology.