Figure 4 - uploaded by Rainer Gemulla
Content may be subject to copyright.
Memory representation of Elias gamma and k-gamma encoding of four example codewords of effective bits of only the non-zero values. If all values are 0, only the separator bit has to be stored and the zero bit sequence of the shared prefix has length zero. Example codewords for k-gamma as well as k-gamma0 encoding can be found in the appendix. Figure 4 illustrates the memory layout of an example with 4 codewords for Elias gamma, 1-gamma, 2-gamma, and 4-gamma encoding. Elias gamma encoding-shown at the top of Figure 4-stores prefix and value of each codeword together. k-gamma encoding stores shared prefix and the values separately. While each value has its own prefix for 1-gamma encoding, two or four values share a prefix when 2-gamma or 4-gamma encoding, respectively, is used. Furthermore, each of the k values of a block starts at the same relative bit address in their memory word (e.g., v3 and v4 in words 0x00 and 0x04). In our actual implementation of k-gamma encoding, we split the memory into smaller chunks. The codewords grow forward from the start of each chunk and the shared prefixes grow backwards from the end of the chunk. 4

Memory representation of Elias gamma and k-gamma encoding of four example codewords of effective bits of only the non-zero values. If all values are 0, only the separator bit has to be stored and the zero bit sequence of the shared prefix has length zero. Example codewords for k-gamma as well as k-gamma0 encoding can be found in the appendix. Figure 4 illustrates the memory layout of an example with 4 codewords for Elias gamma, 1-gamma, 2-gamma, and 4-gamma encoding. Elias gamma encoding-shown at the top of Figure 4-stores prefix and value of each codeword together. k-gamma encoding stores shared prefix and the values separately. While each value has its own prefix for 1-gamma encoding, two or four values share a prefix when 2-gamma or 4-gamma encoding, respectively, is used. Furthermore, each of the k values of a block starts at the same relative bit address in their memory word (e.g., v3 and v4 in words 0x00 and 0x04). In our actual implementation of k-gamma encoding, we split the memory into smaller chunks. The codewords grow forward from the start of each chunk and the shared prefixes grow backwards from the end of the chunk. 4

Source publication
Conference Paper
Full-text available
We study algorithms for efficient compression and decompression of a sequence of integers on modern hardware. Our focus is on universal codes in which the codeword length is a monotonically non-decreasing function of the uncompressed integer value; such codes are widely used for compressing "small integers". In contrast to traditional integer compr...

Contexts in source publication

Context 1
... bits 0x08 0x04 0x00 Figure 4 illustrates the memory layout of an example with 4 codewords for Elias gamma, 1-gamma, 2-gamma, and 4-gamma encoding. Elias gamma encoding-shown at the top of Figure 4-stores prefix and value of each code- word together. ...
Context 2
... bits 0x08 0x04 0x00 Figure 4 illustrates the memory layout of an example with 4 codewords for Elias gamma, 1-gamma, 2-gamma, and 4-gamma encoding. Elias gamma encoding-shown at the top of Figure 4-stores prefix and value of each code- word together. k-gamma encoding stores shared prefix and the values separately. ...

Similar publications

Conference Paper
Full-text available
Memory is one of the most significant detrimental factors in increasing the cost and area of embedded systems, especially as semiconductor technology scales down. Code compres- sion techniques have been employed to reduce the memory requirement of the system without sacrificing its functional- ity. Bitmask-based code compression has been demonstrat...
Article
Full-text available
In this paper, we present a novel method for fast lossy or lossless compression and decompression of regular height fields. The method is suitable for SIMD parallel implementation and thus inherently suitable for modern GPU architectures. Lossy compression is achieved by approximating the height field with a set of quadratic Bezier surfaces. In add...

Citations

... Many prior compression algorithms leverage repetitions in a data sequence. Null suppression [29] omits the leading zeros in the bit representation of an integer and records the byte length of each value, such as in 4-Wise NS [95], Masked-VByte [91], Google varint [1] and varint-G8IU [98]. Dictionary [32,35,37,82,84,93,111] and entropy-based compression algorithms [63,104] build a bijective map between the original values and the code words. ...
Preprint
Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto improvement on both compression ratio and random access speed over the existing solutions. When integrating LeCo into widely-used applications, we observe up to 3.9x speed up in filter-scanning a Parquet file and a 16% increase in Rocksdb's throughput.
... From a lightweight integer compression perspective, the state-of-the-art utilization of these SIMD extensions works as follows [8]: While a scalar compression algorithm compresses a block of N consecutive integers, the state-ofthe-art SIMD approach scales this block size to k ⋅ N with k as the number of integers that can be simultaneously processed with an SIMD register. As shown in various papers [2,[6][7][8]10], this scaling approach increases the performance of compression as well as decompression routines. However, this scaling approach can lead to a degradation of the compression ratio compared to the scalar variant. ...
... We call this number the bit width of a value. Over the past decades, a large corpus of different algorithms has evolved [2,[6][7][8]10]. Generally, lightweight integer compression algorithms employ a subset of the following five fundamental techniques: frameof-reference (FOR) [13,14], delta coding (DELTA) [7,15], dictionary compression (DICT) [2,14], run-length encoding (RLE) [2,15,16], and null suppression (NS) [2,7,15]. ...
... register [8]. On the one hand, this scaling SIMD approach increases the performance of the compression routines mainly to fact that only a contiguous-also called lineardata access pattern is required for the implementation [2,[6][7][8]10]. On the other hand, the scaling SIMD approach also affects the compression result, especially the size. ...
Article
Full-text available
Integer compression plays an important role in columnar database systems to reduce the main memory footprint as well as to speedup query processing. To keep the additional computational effort of (de)compression as low as possible, the powerful Single Instruction Multiple Data (SIMD) extensions of modern CPUs are heavily applied. While a scalar compression algorithm usually compresses a block of N consecutive integers, the state-of-the-art SIMDified implementation scales the block size to k·N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \cdot N$$\end{document} with k as the number of elements which could be simultaneously processed in an SIMD register. On the one hand, this scaling SIMD approach improves the performance of (de)compression. But on the other hand, it can lead to a degradation of the memory footprint of the compressed data. Within this article, we analyze this degradation effect for various integer compression algorithms and present a novel SIMD concept to overcome that effect. The core idea of our novel SIMD concept called BOUNCE is to concurrently compress k different blocks of size N within SIMD registers, guaranteeing the same compression ratio as scalar variant. As we are going to show, our proposed SIMD idea works well on various Intel CPUs and may offer a new generalized SIMD concept to optimize further algorithms.
... These small integers can then be efficiently compressed with any of a range of integer compression techniques (Fig 2), a subject that has been heavily developed for Information Retrieval. We compare a number of such methods, including the Simple-8b algorithm from [23] (which we implement and use in our package) in S1 Appendix. ...
Article
Full-text available
Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alterations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify combinatorial gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Recently developed tools scale to human exome-wide screens for pairwise interactions, but none to date have included the possibility of three-way interactions. Expanding upon recent state-of-the-art methods, we make a number of improvements to the performance on large-scale data, making consideration of three-way interactions possible. We demonstrate our proposed method, Pint, on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens. Pint outperforms known methods in simulated data, and identifies a number of biologically plausible gene effects in both the antibiotic and siRNA models. For example, we have identified a combination of known tumour suppressor genes that is predicted (using Pint) to cause a significant increase in cell proliferation.
... A second obvious use case in the context of column-stores is integer compression [1,2,10]. To keep the additional computational effort of (de)compression as low as possible, most of the integer (de)compression algorithms are explicitly SIMDified with a linear access pattern [9,19,33,37]. The ensure a linear access pattern, the state-of-theart SIMD approach can be characterized by [37]: While a scalar compression algorithm would compress a block of N consecutive integers, the state-of-the-art SIMD approach scales this block to k N consecutive integers with k as the number of integers that can be simultaneously processed with an SIMD register. ...
Article
Full-text available
The Single Instruction Multiple Data (SIMD) paradigm became a core principle for optimizing query processing in columnar database systems. Until now, only the instructions are considered to be efficient enough to achieve the expected speedups, while avoiding is considered almost imperative. However, the instruction offers a very flexible way to populate SIMD registers with data elements coming from non-consecutive memory locations. As we will discuss within this article, the instruction can achieve the same performance as the instruction, if applied properly. To enable the proper usage, we outline a novel access pattern allowing fine-grained, partition-based SIMD implementations. Then, we apply this partition-based SIMD processing to two representative examples from columnar database systems to experimentally demonstrate the applicability and efficiency of our new access pattern.
... From a lightweight integer compression perspective, the state-of-the-art utilization of these SIMD extensions works as follows: [8]: While a scalar compression algorithm compresses a block of N consecutive integers, the stateof-the-art SIMD approach scales this block size to k · N with k as the number of integers that can be simultaneously processed with an SIMD register. As shown in various papers [2,[6][7][8]10], this scaling approach increases the performance of compression as well as decompression routines. However, this scaling approach can lead to a degradation of the compression ratio compared to the scalar variant. ...
... We call this number the bit width of a value. Over the past decades, a large corpus of different algorithms has evolved [2,[6][7][8]10]. Generally, lightweight integer compression algorithms employ a subset of the following five fundamental techniques: frame-of-reference (FOR) [12,13], delta coding (DELTA) [7,14], dictionary compression (DICT) [2,13], run-length encoding (RLE) [2,14,15], and null suppression (NS) [2,7,14]. ...
... Based on that scalar processing foundation, the state-of-the-art SIMD approach scales this block size to k · N with k as the number of integers that can be simultaneously processed with an SIMD register [8]. On the one hand, this scaling SIMD approach increases the performance of the compression routines mainly to fact that only a contiguous -also called linear -data access pattern is required for the implementation [2,[6][7][8]10]. On the other hand, the scaling SIMD approach also affects the compression result, especially the BOUNCE Fig. 9: Illustration of the partition-based SIMD processing concept using SIMD register size of k = 4. ...
Preprint
Full-text available
Integer compression plays an important role in columnar database systems to reduce the main memory footprint as well as to speedup query processing. To keep the additional computational effort of (de)compression as low as possible, the powerful Single Instruction Multiple Data (SIMD) extensions of modern CPUs are heavily applied. While a scalar compression algorithm usually compresses a block of N consecutive integers, the state-of-the-art SIMDified implementation scales the block size to k · N with k as the number of elements which could be simultaneously processed in an SIMD register. On the one hand, this scaling SIMD approach improves the performance of (de)compression but can lead to a degradation of the compression ratio compared to the scalar variant on the other hand. Within this article, we analyze this degradation effect for various integer compression algorithms and present a novel SIMD concept to overcome that effect. The core idea of our novel SIMD concept called BOUNCE is to concurrently compress k different blocks of size N within SIMD registers, guaranteeing the same compression ratio as scalar variant. As we are going to show, our proposed SIMD idea works well on various Intel CPUs and may offer a new generalized SIMD concept to optimize further algorithms.
... These small integers can then be efficiently compressed with any of a range of integer compression techniques (Figure 1), a subject that has been heavily developed for Information Retrieval. We compare a number of such methods, including the Simple-8b algorithm from [51] (which we implement and use in our package) in Appendix A. 2.6. Parallelisation. ...
... Simple-8b. Simple-8b is a non-SIMD compression scheme, with performance comparable to other state of the art methods [51,37,56]. While SIMD-based compression schemes can often offer significantly improved compression and decompression speed [33] [51], their implementation is architecture dependant. ...
Preprint
Full-text available
A bstract Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alterations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify combinatorial gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Recently developed tools scale to human exome-wide screens for pairwise interactions, but none to date have included the possibility of three-way interactions. Expanding upon recent state-of-the art methods, we make a number of improvements to the performance on large-scale data, making consideration of three-way interactions possible. We demonstrate our proposed method, Pint , on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens. Pint outperforms known methods in simulated data, and identifies a number of biologically plausible gene effects in both the antibiotic and siRNA models. For example, we have identified a combination of known tumor suppressor genes that is predicted (using Pint ) to cause a significant increase in cell proliferation.
... For the hardware-conscious implementation, we have to distinguish three different main groups of primitives across all TVL classes [12]: (i) load/store primitives, (ii) elementwise primitives, and (iii) horizontal primitives. The hardwarespecific implementation for them on the VE can be done by using a limited number of intrinsics supporting only 32-bit and 64-bit data widths. ...
... When implementing vectorized column-store operators using primitives or intrinsics, bitmasks (or masks in short) often have to be used and combined, e.g., by shifting them or using boolean logic [3]- [5]. A simple example is a variant of the vectorized intersection of two index lists [12]: ...
... These small integers can then be efficiently compressed with any of a range of integer compression techniques (Fig. 3), a subject that has been heavily developed for Information Retrieval. We compare a number of such methods, including the Simple-8b algorithm from [35] (which we implement and use in our 2.4. Parallelisation. ...
... Simple-8b. Simple-8b is a non-SIMD compression scheme, with performance comparable to other state of the art methods [35,25,38]. While SIMD-based compression schemes can often offer significantly improved compression and decompression speed [22] [35], their implementation is architecture dependant. ...
Preprint
Full-text available
A bstract Large-scale genotype-phenotype screens provide a wealth of data for identifying molecular alternations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify pairwise gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Existing computational tools which account for epistasis do not scale to human exome-wide screens and struggle with genetically diverse bacterial species such as Pseudomonas aeruginosa . Combining earlier work in interaction detection with recent advances in integer compression, we present a method for epistatic interaction detection on sparse (human) exome-scale data, and an R implementation in the package Pint . Our method takes advantage of sparsity in the input data and recent progress in integer compression to perform lasso-penalised linear regression on all pairwise combinations of the input, estimating up to 200 million potential effects, including epistatic interactions. Hence the human exome is within the reach of our method, assuming one parameter per gene and one parameter per epistatic effect for every pair of genes. We demonstrate Pint on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens.
... In order to decode gamma codes faster on modern processors, a simple variant of gamma is proposed by Schlegel et al. [91] and called k-gamma. Groups of k integers are encoded together, with k = 2, 3 or 4, using the same number of bits. ...
Article
Full-text available
The data structure at the core of large-scale search engines is the inverted index , which is essentially a collection of sorted integer sequences called inverted lists . Because of the many documents indexed by such engines and stringent performance requirements imposed by the heavy load of queries, the inverted index stores billions of integers that must be searched efficiently. In this scenario, index compression is essential because it leads to a better exploitation of the computer memory hierarchy for faster query processing and, at the same time, allows reducing the number of storage machines. The aim of this article is twofold: first, surveying the encoding algorithms suitable for inverted index compression and, second, characterizing the performance of the inverted index through experimentation.
... Parallel implementation. The encoders in this study are parallelizable [28,38]. Given n workers, we can partition the gradient into n subvectors for RLH and allow each worker to find the frequencies of its corresponding subvector. ...
Conference Paper
Full-text available
Distributed stochastic algorithms, equipped with gradient compression techniques, such as codebook quantization, are becoming increasingly popular and considered state-of-the-art in training large deep neural network (DNN) models. However, communicating the quantized gradients in a network requires efficient encoding techniques. For this, practitioners generally use Elias encoding-based techniques without considering their computational overhead or data-volume. In this paper, based on Huffman coding, we propose several lossless encoding techniques that exploit different characteristics of the quantized gradients during distributed DNN training. Then, we show their effectiveness on 5 different DNN models across three different data-sets, and compare them with classic state-of-the-art Elias-based encoding techniques. Our results show that the proposed Huffman-based encoders (i.e., RLH, SH, and SHS) can reduce the encoded data-volume by up to 5.1×, 4.32×, and 3.8×, respectively, compared to the Elias-based encoders.