Article

Theoretical foundations of transform coding

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Discusses various aspects of transform coding, including: source coding, constrained source coding, the standard theoretical model for transform coding, entropy codes, Huffman codes, quantizers, uniform quantization, bit allocation, optimal transforms, transforms visualization, partition cell shapes, autoregressive sources, transform optimization, synthesis transform optimization, orthogonality and independence, and departures form the standard model

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Next, the FC sends the optimization results back to the N receivers through dedicated links. The selected receivers sample their received sensing signals around the estimated time delay and employ the KLT encoding scheme [27] to quantize these samples under the given quantization bit allocation. Finally, the quantized samples are transmitted to the FC for sensing performance improvement. ...
... The main steps of the proposed MCSCA algorithm are summarized in Algorithm 1, and it can be proved that Algorithm 1 is guaranteed to converge to the set of KKT solutions of problem (27). The details will be presented in the following. ...
... t = t + 1; 6: until some convergence criterion is met Remark 1. It is noteworthy that to make the overall problem tractable, we have ignored the discrete constraints and relaxed {X nj } as continuous variables in problem (27). After obtaining the optimized quantization bit allocation, we can simply round up the continuous bit numbers to obtain a discrete solution, i.e., ...
Preprint
In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multiple access channel (MAC) for cooperative target localization. To improve the localization performance, we present a hybrid information-signal domain cooperative sensing (HISDCS) design, where each sensing receiver transmits both the estimated time delay/effective reflecting coefficient and the received sensing signal sampled around the estimated time delay to the FC. Then, we propose to minimize the number of channel uses by utilizing an efficient Karhunen-Lo\'eve transformation (KLT) encoding scheme for signal quantization and proper node selection, under the Cram\'er-Rao lower bound (CRLB) constraint and the capacity limits of MAC. A novel matrix-inequality constrained successive convex approximation (MCSCA) algorithm is proposed to optimize the wireless backhaul resource allocation, together with a greedy strategy for node selection. Despite the high non-convexness of the considered problem, we prove that the proposed MCSCA algorithm is able to converge to the set of Karush-Kuhn-Tucker (KKT) solutions of a relaxed problem obtained by relaxing the discrete variables. Besides, a low-complexity quantization bit reallocation algorithm is designed, which does not perform explicit node selection, and is able to harvest most of the performance gain brought by HISDCS. Finally, numerical simulations are presented to show that the proposed HISDCS design is able to significantly outperform the baseline schemes.
... Therefore, in current research on CS for images, the CS measurements are usually quantified using the uniform scalar quantization method [21], and the quantized measurements are encoded by entropy coding to improve the compression performance [22,23]. However, since the computational cost involved in entropy coding is usually high [23,24], using entropy coding will reduce the low-complexity advantage of the CS encoder. ...
... To ensure a low complexity of the encoder, the BCS encoder usually uses uniform quantization and entropy coding to process the BCS measurements. In addition, uniform quantizer is considered as the optimal quantizer for entropy-coded quantization in data compression theory [23,24], which is why BCS encoders tend to use uniform quantization and entropy coding. Currently, the most advanced quantization techniques for BCS of images are believed to be the prediction quantization method [9] and the progressive quantization method [10]. ...
Article
Full-text available
Block compressed sensing (BCS) is a promising method for resource-constrained image/video coding applications. However, the quantization of BCS measurements has posed a challenge, leading to significant quantization errors and encoding redundancy. In this paper, we propose a quantization method for BCS measurements using convolutional neural networks (CNN). The quantization process maps measurements to quantized data that follow a uniform distribution based on the measurements’ distribution, which aims to maximize the amount of information carried by the quantized data. The dequantization process restores the quantized data to data that conform to the measurements’ distribution. The restored data are then modified by the correlation information of the measurements drawn from the quantized data, with the goal of minimizing the quantization errors. The proposed method uses CNNs to construct quantization and dequantization processes, and the networks are trained jointly. The distribution parameters of each block are used as side information, which is quantized with 1 bit by the same method. Extensive experiments on four public datasets showed that, compared with uniform quantization and entropy coding, the proposed method can improve the PSNR by an average of 0.48 dB without using entropy coding when the compression bit rate is 0.1 bpp.
... The former concerns the designed transform based methods that carry out a conventional transform of SD in orthogonal bases or redundant dictionaries to Communicated by Q. Shen. obtain sparse representations [1]. The latter includes the learned transform based methods where the SD analysis is learned using Machine Learning (ML) algorithms or deep neural networks. ...
... Transforms are the most common techniques followed by almost all data compression methods [1]. It consists in converting the data to be compressed from one domain or space to another to identify their features, that could not be as easily detected in their spatial domain. ...
Article
Full-text available
Seismic Data (SD) have been for several decades used as one of the main inspection and exploration tools in various fields, particularly petroleum industry and geoscience. However, seismic datasets are huge and involve many terabytes, whose handling and storage are expensive for industry activities and computers capabilities. Driven by the large volume of SD, several compression methods have been proposed the last five decades to significantly reduce the SD size, while aiming the highest possible preservation of rocks structural and lithological characteristics. Considering the importance of SD compression, this paper is expected to make the first overview of a large number of relevant state-of-the-art papers related to SD compression, where the papers are divided into two main classes, with respect to the approach they use to extract the SD relevant features. Our aim is to review recent achievements in SD compression, and to go over covering scope, key techniques and performances of the main representative methods on this topic. This, along with issues of these latter, can help raise some open challenges and future directions for upcoming SD compression efforts.
... In this sense, the main challenge is the widespread adoption of transform coding [6] for image and video compression. Designing these codecs becomes simpler when using the MSE: by virtue of Parseval's identity, the distortion is the same in the pixel and the transform domain, allowing to perform RDO in the transform domain. ...
... that is, a weighted version of the optimization problem for nonseparable graphs, which is easier to optimize than Eq. (6). The equation keeping Mc fixed is analogous. ...
... NTC. Nonlinear transform coding (NTC) [5] is a nonlinear extension of transform coding [19], which simplifies vector quantization as scalar quantization and decorrelates data sources using transform and entropy model. NTC is the most popular coding structure in recent methods for learned image compression. ...
... Prior works such as transform VQ [26] and vector transform coding [29,31,32] (VTC) first propose to vector quantize the transform coefficients decades ago. However, due to the high complexity of VQ and the weak decorrelation capabilities of linear transform, they draw less attention to the researchers compared to transform coding [19]. Recently, VQ is known to be effective to address the "posterior collapse" issue in many generative models [16,37,42]. ...
Preprint
In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). NVTC solves the critical complexity issue of VQ through (1) a multi-stage quantization strategy and (2) nonlinear vector transforms. In addition, we apply entropy-constrained VQ in latent space to adaptively determine the quantization boundaries for joint rate-distortion optimization, which improves the performance both theoretically and experimentally. Compared to previous NTC approaches, NVTC demonstrates superior rate-distortion performance, faster decoding speed, and smaller model size. Our code is available at https://github.com/USTC-IMCL/NVTC
... To gain a deeper understanding of the distinctions between different networks, we adopt the "transform coding [38]" view, essentially treating each network as a unique transform and evaluating its effectiveness by measuring the correlation in the latent space. This correlation is measured onŷ as shown in Fig. 2. To estimate the de-correlation ability of different transforms, we utilize the latent correlation ρ k×k presented in [39] and calculate it as follows: ...
Preprint
Recent advances in learning-based image compression typically come at the cost of high complexity. Designing computationally efficient architectures remains an open challenge. In this paper, we empirically investigate the impact of different network designs in terms of rate-distortion performance and computational complexity. Our experiments involve testing various transforms, including convolutional neural networks and transformers, as well as various context models, including hierarchical, channel-wise, and space-channel context models. Based on the results, we present a series of efficient models, the final model of which has comparable performance to recent best-performing methods but with significantly lower complexity. Extensive experiments provide insights into the design of architectures for learned image compression and potential direction for future research. The code is available at \url{https://gitlab.com/viper-purdue/efficient-compression}.
... Video compression. Most high-performing modern video compression methods rely on hybrid coders that combine transform coding [41,42] and motion compensation [43,44]. Such belief continues in most of the recently popularized learning-based solutions [45,46,47,48]. ...
Preprint
Full-text available
We propose a new transformer-based image and video tokenizer with Binary Spherical Quantization (BSQ). BSQ projects the high-dimensional visual embedding to a lower-dimensional hypersphere and then applies binary quantization. BSQ is (1) parameter-efficient without an explicit codebook, (2) scalable to arbitrary token dimensions, and (3) compact: compressing visual data by up to 100$\times$ with minimal distortion. Our tokenizer uses a transformer encoder and decoder with simple block-wise causal masking to support variable-length videos as input. The resulting BSQ-ViT achieves state-of-the-art visual reconstruction quality on image and video reconstruction benchmarks with 2.4$\times$ throughput compared to the best prior methods. Furthermore, by learning an autoregressive prior for adaptive arithmetic coding, BSQ-ViT achieves comparable results on video compression with state-of-the-art video compression standards. BSQ-ViT also enables masked language models to achieve competitive image synthesis quality to GAN- and diffusion-based methods.
... U is a unitary matrix composed by a set of linearly independent (column) orthonormal vectors[78],[79]. Consequently, U · U T = I d×d , where I d×d denotes the identity matrix. ...
Preprint
Full-text available
We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance (predictive expressiveness) could be lost, using the cross entropy risk, when a given encoder-decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder-decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder-decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon's information measures offer new interpretations and explanations for representation learning.
... The efficacy of transform coding lies in separating the task of decorrelating a source from coding it [61]. Transform coding traditionally assumes the source to be Gaussian, as it leads to a closed-form solution. ...
Article
Full-text available
End-to-end learned image compression codecs have notably emerged in recent years. These codecs have demonstrated superiority over conventional methods, showcasing remarkable flexibility and adaptability across diverse data domains while supporting new distortion losses. Despite challenges such as computational complexity, learned image compression methods inherently align with learning-based data processing and analytic pipelines due to their well-suited internal representations. The concept of Video Coding for Machines has garnered significant attention from both academic researchers and industry practitioners. This concept reflects the growing need to integrate data compression with computer vision applications. In light of these developments, we present a comprehensive survey and review of lossy image compression methods. Additionally, we provide a concise overview of two prominent international standards, MPEG Video Coding for Machines and JPEG AI. These standards are designed to bridge the gap between data compression and computer vision, catering to practical industry use cases.
... The goal of (lossy) image compression is minimizing bitrates while preserving information critical for human perception. Transform coding is a basic framework of lossy compression, which divides the compression task into decorrelation and quantization [15]. Decorrelation reduces the statistical dependencies of the pixels, allowing for more effective entropy coding, while quantization represents the values as a finite set of integers. ...
Article
Full-text available
The rise of mobile AI accelerators allows latency-sensitive applications to execute lightweight Deep Neural Networks (DNNs) on the client side. However, critical applications require powerful models that edge devices cannot host and must therefore offload requests, where the high-dimensional data will compete for limited bandwidth. Split Computing (SC) alleviates resource inefficiency by partitioning DNN layers across devices, but current methods are overly specific and only marginally reduce bandwidth consumption. This work proposes shifting away from focusing on executing shallow layers of partitioned DNNs. Instead, it advocates concentrating the local resources on variational compression optimized for machine interpretability. We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an environment reflecting the asymmetric resource distribution between edge devices and servers. Our method achieves 60% lower bitrate than a state-of-the-art SC method without decreasing accuracy and is up to 16x faster than offloading with existing codec standards.
... The efficacy of transform coding lies in separating the task of decorrelating a source from coding it [61]. Transform coding traditionally assumes the source to be Gaussian, as it leads to a closed-form solution. ...
Preprint
Full-text available
End-to-end learned image compression codecs have notably emerged in recent years. These codecs have demonstrated superiority over conventional methods, showcasing remarkable flexibility and adaptability across diverse data domains while supporting new distortion losses. Despite challenges such as computational complexity, learned image compression methods inherently align with learning-based data processing and analytic pipelines due to their well-suited internal representations. The concept of Video Coding for Machines has garnered significant attention from both academic researchers and industry practitioners. This concept reflects the growing need to integrate data compression with computer vision applications. In light of these developments, we present a comprehensive survey and review of lossy image compression methods. Additionally, we provide a concise overview of two prominent international standards, MPEG Video Coding for Machines and JPEG AI. These standards are designed to bridge the gap between data compression and computer vision, catering to practical industry use cases.
... Lossy image compression is one of the most fundamental issues in information theory and signal processing. Most existing methods for lossy image compression follow the scheme of transform coding (Goyal 2001), where images are transformed to a latent space for de-correlation and energy compression, followed by quantization and entropy coding. Historically, traditional codecs such as JPEG (Wallace 1992), BPG (Sullivan et al. 2012), and VVC (Bross et al. 2021) have utilized simple linear transforms (e.g., the discrete cosine transform) to accomplish this goal. ...
Article
While convolution and self-attention are extensively used in learned image compression (LIC) for transform coding, this paper proposes an alternative called Contextual Clustering based LIC (CLIC) which primarily relies on clustering operations and local attention for correlation characterization and compact representation of an image. As seen, CLIC expands the receptive field into the entire image for intra-cluster feature aggregation. Afterward, features are reordered to their original spatial positions to pass through the local attention units for inter-cluster embedding. Additionally, we introduce the Guided Post-Quantization Filtering (GuidedPQF) into CLIC, effectively mitigating the propagation and accumulation of quantization errors at the initial decoding stage. Extensive experiments demonstrate the superior performance of CLIC over state-of-the-art works: when optimized using MSE, it outperforms VVC by about 10% BD-Rate in three widely-used benchmark datasets; when optimized using MS-SSIM, it saves more than 50% BD-Rate over VVC. Our CLIC offers a new way to generate compact representations for image compression, which also provides a novel direction along the line of LIC development.
... Compression algorithms combined with machine learning are mainly used for images, sound and video, and text [15][16][17][18], and machine learning is rarely used in the compression of instrument status data. Common compression algorithms applied to numerical data, such as vector quantization [19] and transformation coding [20], fail to ensure our fixed precision goals while maintaining high compression rates. ...
Article
Full-text available
The real-time transmission of ship status data from vessels to shore is crucial for live status monitoring and guidance. Traditional reliance on expensive maritime satellite systems for this purpose is being reconsidered with the emergence of the global short message communication service offered by the BeiDou-3 navigation satellite system. While this system presents a more cost-effective solution, its bandwidth is notably insufficient for handling real-time ship status data. This inadequacy necessitates the compression of such data. Therefore, this paper introduces an algorithm tailored for real-time compression of sequential ship status data. The algorithm is engineered to ensure both accuracy and the preservation of valid data range integrity. Our methodology integrates quantization, predictive coding employing an attention-averaging-based predictor, and arithmetic coding. This combined approach facilitates the transmission of succinct messages through the BeiDou Navigation System, enabling the live monitoring of ocean-going vessels. Experimental trials conducted with authentic data obtained from ship monitoring systems validate the efficiency of our approach. The achieved compression rates closely approximate theoretical minimum values. Consequently, this method exhibits substantial promise for the real-time transmission of parameters across various systems.
... Another CNN is used to map the latent representation back to the image domain to approximate the input image. This idea is similar to the traditional transform coding approach [9], except that the analysis and synthesis functions are nonlinear, comprise CNN, and are learned from real-world images, whereas in the traditional transform coding, the analysis and synthesis functions are typically linear and derived based on simple models of images [10][11][12]. In most learning-based compression systems, a CNN-based system that learns the joint probability distribution of the quantized latent representation is combined with the auto-encoder architecture to obtain an end-to-end learned image compression system [13,14]. ...
Article
Full-text available
This paper explores learned image compression based on traditional and learned discrete wavelet transform (DWT) architectures and learned entropy models for coding DWT subband coefficients. A learned DWT is obtained through the lifting scheme with learned nonlinear predict and update filters. Several learned entropy models, with varying computational complexities, are explored to exploit inter- and intra-DWT subband coefficient dependencies, akin to traditional EZW, SPIHT, or EBCOT algorithms. Experimental results show that when the explored learned entropy models are combined with traditional wavelet filters, such as the CDF 9/7 filters, compression performance that far exceeds that of JPEG2000 can be achieved. When the learned entropy models are combined with the learned DWT, compression performance increases further. The computations in the learned DWT and all entropy models, except one, can be simply parallelized, and thus, the systems provide practical encoding and decoding times on GPUs, unlike other DWT-based learned compression systems in the literature.
... This section provides the relevant background on learnable image compression and introduces the problem of estimating the entropy of the latent space. In most end-to-end learnable image compression schemes, an encoder-decoder pipeline is implemented as a neural network-based autoencoder, following the so-called transform coding approach [10]. The encoder f a projects the image x into a latent space y = f a (x, θ f ) ∈ R Nc×N d , where N c and N d represent the number and the dimension of the flattened latent space channels respectively, while θ f represents the learnable parameters of the encoder. ...
Chapter
Full-text available
In an end-to-end learned image compression framework, an encoder projects the image on a low-dimensional, quantized, latent space while a decoder recovers the original image. The encoder and decoder are jointly trained with standard gradient backpropagation to minimize a rate-distortion (RD) cost function accounting for both distortions between the original and reconstructed image and the quantized latent space rate. State-of-the-art methods rely on an auxiliary neural network to estimate the rate R of the latent space. We propose a non-parametric entropy model that estimates the statistical frequencies of the quantized latent space during training. The proposed model is differentiable, so it can be plugged into the cost function to be minimized as a rate proxy and can be adapted to a given context without retraining. Our experiments show comparable performance with a learned rate estimator and better performance when is adapted over a temporal context.KeywordsLearned image codingentropy estimationdifferentiable entropyautoencoderimage compression
... Since then, most LIC frameworks have followed the paradigm and improved the compression performance rapidly. Based on the recent representative work [11,13,26] and the transform coding theory [40], we formulate a generalized image compression framework as follows: ...
Preprint
Full-text available
Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
... According to the Shannon-Nyquist sampling theorem which utilizes the band-limitedness of natural signals, the frequency of equally spaced analog-to-digital sampling needs to be higher than twice of the highest frequency component in original signals to ensure the reconstruction without aliasing. When acquiring and transmitting signals with high-frequency components (for example, structural vibration responses under broadband ambient excitations), expensive high-speed sampling is required at first, and then transform coding techniques 34,35 are used to obtain a low-complexity representation of the signal in a transformed domain, where only a few major components and their positions are retained to compress the fully sampled signal, thus ensuring stable signal transmission and efficient storage. This framework not only wastes sampling and coding resources, but also introduces the overhead to transmit the positions of major components. ...
Article
Full-text available
Compressive sampling (CS) is a novel signal processing paradigm whereby the data compression is performed simultaneously with the sampling, by measuring some linear functionals of original signals in the analog domain. Once the signal is sparse sufficiently under some bases, it is strictly guaranteed to stably decompress/reconstruct the original one from significantly fewer measurements than that required by the sampling theorem, bringing considerable practical convenience. In the field of civil engineering, there are massive application scenarios for CS, as many civil engineering problems can be formulated as sparse inverse problems with linear measurements. In recent years, CS has gained extensive theoretical developments and many practical applications in civil engineering. Inevitable modelling and measurement uncertainties have motivated the Bayesian probabilistic perspective into the inverse problem of CS reconstruction. Furthermore, the advancement of deep learning techniques for efficient representation has also contributed to the elimination of the strict assumption of sparsity in CS. This paper reviews the advancements and applications of CS in civil engineering, focusing on challenges arising from data acquisition and analysis. The reviewed theories also have applicability to inverse problems in broader scientific fields.
... This file is then used to reconstruct the original image. Several models have been proposed for neural (or learned) image compression such as convolutional neural networks (CNNs), autoencoders, and generative adversarial networks [22,8,9,42,51,13,57,63,43]. In the most basic version of these models, an image vector x is mapped to a latent representation y using a convolutional autoencoder. ...
Preprint
Full-text available
Recent neural image compression (NIC) advances have produced models which are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.
... Our framework is inspired by the framework widely adopted in learning-based lossy compression methods [8][9]. In the framework of a classical learning-based lossy image compression, the operation can be formulated as [13]: ...
Article
Full-text available
Lossless image compression is an important research field in image compression. Recently, learning-based lossless image compression methods achieved impressive performance compared with traditional lossless methods, such as WebP, JPEG2000, and FLIF. The aim of the lossless image compression algorithms is to use shorter codelength to represent images. To encode an image with fewer bytes, eliminating the redundancies among the pixels in the image is highly important. Hence, in this paper, we explore the idea of combining an autoregressive model for the raw images based on the end-to-end lossless architecture proposed to enhance the performance. Furthermore, inspired by the successful achievements of Channel-conditioning models, we propose a Multivariant Mixture distribution Channel-conditioning model (MMCC) in our network architecture to boost performance. The experimental results show that our approach outperforms most classical lossless compression methods and existing learning-based lossless methods.
... Traditional image compression standards, such as JPEG [67], JPEG2000 [56], HEVC [63], and VVC Intra [10], have been extensively used in practice after several decades of development. These standards rely on transform coding [24], which decomposes the lossy image compression task into three parts: transform, quantization, and entropy coding. Each module of these standards is manually designed with multiple modes, and rate-distortion optimization is performed to determine the optimal mode. ...
Preprint
Full-text available
Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.
Article
Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.
Article
We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner–Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner–Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner–Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.
Article
Learned video compression methods have gained various interests in the video coding community. Most existing algorithms focus on exploring short-range temporal information and developing strong motion compensation. Still, the ignorance of long-range temporal information utilization constrains the potential of compression. In this paper, we are dedicated to exploiting both long- and short-range temporal information to enhance video compression performance. Specifically, for long-range temporal information exploration, we propose a temporal prior that can be continuously supplemented and updated during compression within the group of pictures (GOP). With the updating scheme, the temporal prior can provide richer mutual information between the overall prior and the current frame for the entropy model, thus facilitating Gaussian parameter prediction. As for the short-range temporal information, we propose a progressive guided motion compensation to achieve robust and accurate compensation. In particular, we design a hierarchical structure to build multi-scale compensation, and by employing optical flow guidance, we generate pixel offsets as motion information at each scale. Additionally, the compensation results at each scale will guide the next scale’s compensation, forming a flow-to-kernel and scale-by-scale stable guiding strategy. Extensive experimental results demonstrate that our method can obtain advanced rate-distortion performance compared to the state-of-the-art learned video compression approaches and the latest standard reference software in terms of PSNR and MS-SSIM. The codes are publicly available on: https://github.com/Huairui/LSTVC .
Article
Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields, such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. In the lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals. We solve the joint lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to enhance lossless compression performance. In the near-lossless mode, we quantize the original residuals to satisfy a given $\ell _\infty$ error bound, and propose a scalable near-lossless compression scheme that works for variable $\ell _\infty$ bounds instead of training multiple networks. To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and accelerate the entropy coding with adaptive residual interval. Experimental results demonstrate that the DLPR coding system achieves both the state-of-the-art lossless and near-lossless image compression performance with competitive coding speed.
Article
Deep learning (DL) techniques have shown promising results in image compression compared to conventional methods, with competitive bitrate and image reconstruction quality from compressed latent. However, whereas learned image compression has progressed towards a higher peak signal-to-noise ratio (PSNR) and fewer bits per pixel (bpp), its robustness to adversarial images has never received deliberation. In this work, we investigate the robustness of image compression systems where imperceptibly manipulated inputs can stealthily precipitate a significant increase in the compressed bitrate without compromising reconstruction quality. Such attacks can potentially exhaust the storage or network bandwidth of computing systems and lead to service denial. We term it as a DoS attack on image compressors. To characterize the robustness of state-of-the-art learned image compression, we mount white-box and black-box attacks. Our white-box attack employs a gradient ascent approach on the entropy estimation of the bitstream as its bitrate approximation. We propose DCT-Net simulating JPEG compression with architectural simplicity and lightweight training as the substitute in the black-box attack, enabling fast adversarial transferability. Our results on six image compression architectures, each with six different bitrate qualities (thirty-six models in total), show that they are surprisingly fragile, where the white-box attack achieves up to 55× and black-box 2× bpp increase, respectively, revealing the devastating fragility of DL-based compression models. To improve robustness, we propose a novel compression architecture factorAtn incorporating attention modules and a basic factorized entropy model that presents a promising trade-off between rate-distortion performance and robustness to adversarial attacks and surpasses existing learned image compressors.
Article
Effective Adaptive Bitrate (ABR) algorithm or policy is of paramount importance for Real-Time Video Communication (RTVC) amid this pandemic to pursue uncompromised quality of experience (QoE). Existing ABR methods mainly separate the network bandwidth estimation and video encoder control, and fine-tune video bitrate towards estimated bandwidth, assuming the maximization of bandwidth utilization yields the optimal QoE. However, the QoE of an RTVC system is jointly determined by the quality of the compressed video, fluency of video playback, and interaction delay. Solely maximizing the bandwidth utilization without comprehensively considering compound impacts incurred by both transport and video application layers, does not assure a satisfactory QoE. The decoupling of the transport and application layer further exacerbates the user experience due to codec-transport incoordination. This work, therefore, proposes the Palette, a reinforcement learning-based ABR scheme that unifies the processing of transport and video application layers to directly maximize the QoE formulated as the weighted function of video quality, stalling rate, and delay. To this aim, a cross-layer optimization is proposed to derive the fine-grained compression factor of the upcoming frame(s) using cross-layer observations like network conditions, video encoding parameters, and video content complexity. As a result, Palette manages to resolve the codec-transport incoordination and to best catch up with the network fluctuation. Compared with state-of-the-art schemes in real-world tests, Palette not only reduces 3.1%-46.3% of the stalling rate, 20.2%-50.8% of the delay but also improves 0.2%-7.2% of the video quality with comparable bandwidth consumption, under a variety of application scenarios.
Article
Studying the solar system and especially the Sun relies on the data gathered daily from space missions. These missions are data-intensive and compressing this data to make them efficiently transferable to the ground station is a twofold decision to make. Stronger compression methods, by distorting the data, can increase data throughput at the cost of accuracy which could affect scientific analysis of the data. On the other hand, preserving subtle details in the compressed data requires a high amount of data to be transferred, reducing the desired gains from compression. In this work, we propose a neural network-based lossy compression method to be used in NASA's data-intensive imagery missions. We chose NASA's Solar Dynamics Observatory (SDO) mission which transmits 1.4 terabytes of data each day as a proof of concept for the proposed algorithm. In this work, we propose an adversarially trained neural network, equipped with local and non-local attention modules to capture both the local and global structure of the image resulting in a better trade-off in rate-distortion (RD) compared to conventional hand-engineered codecs. The RD variational autoencoder used in this work is jointly trained with a channel-dependent entropy model as a shared prior between the analysis and synthesis transforms to make the entropy coding of the latent code more effective. We also studied how optimizing perceptual losses could help our neural compressor to preserve high-frequency details of the data in the reconstructed compressed image. Our neural image compression algorithm outperforms currently-in-use and state-of-the-art codecs such as JPEG and JPEG-2000 in terms of the RD performance when compressing extreme-ultraviolet (EUV) data. As a proof of concept for use of this algorithm in SDO data analysis, we have performed coronal hole (CH) detection using our compressed images, and generated consistent segmentations, even at a compression rate of $\sim 0.1$ bits per pixel (compared to 8 bits per pixel on the original data) using EUV data from SDO.
Article
Adaptive transform coding is gaining more and more attention for better mining of image content over fixed transforms such as discrete cosine transform (DCT). As a special case, graph transform learning establishes a novel paradigm for the graph-based transforms. However, there still exists a challenge for graph transform learning-based image codecs design on natural image compression, and graph representation cannot describe regular image samples well over graph-structured data. Therefore, in this paper, we propose a cross-channel graph-based transform (CCGBT) for natural color image compression. We observe that neighboring pixels having similar intensities should have similar values in the chroma channels, which means that the prominent structure of the luminance channel is related to the contours of the chrominance channels. A collaborative design of the learned graphs and their corresponding distinctive transforms lies in the assumption that a sufficiently small block can be considered smooth, meanwhile, guaranteeing the compression of the luma and chroma signals at the cost of a small overhead for coding the description of the designed luma graph. In addition, a color image compression framework based on the CCGBT is designed for comparing DCT on the classic JPEG codec. The proposed method benefits from its flexible transform block design on arbitrary sizes to exploit image content better than the fixed transform. The experimental results show that the unified graph-based transform outperforms conventional DCT, while close to discrete wavelet transform on JPEG2000 at high bit-rates.
Article
The traditional strategy of acquiring satellite images involves transmitting compressed satellite data to ground stations solely via the downlink, without utilizing the uplink. In this article, we propose an enhanced remote-sensing (RS) image compression approach that utilizes uplink assistance to improve compression efficiency. By leveraging the uplink, historical images from ground stations can serve as reference images for on-orbit compression, effectively eliminating spatiotemporal redundancy in RS images. However, due to radiation variations among RS images captured on different dates, pixel-wise referencing as employed in the prior codec paradigm is insufficient. To address this, we propose a novel dual-end referencing downsampling-based coding (RefDBC) framework. At the encoder, relevance embedding (RE) evaluates reconstructability and records information to restore texture details from the reference before downsampling. At the decoder, relevance-based super-resolution (SR) uses the identical reference and recorded relevance information to reconstruct the decoded low-resolution (LR) image. By incorporating relevance referencing, RefDBC effectively mitigates fake texture generation caused by downsampling and compression, achieving significant bitrate savings ranging from 35% to 70% compared to standard, learning-based, and DBC compression baselines in experiments on Spot-5 and Luojia3 images. Code, data, and pretrained models are available online at https://github.com/WHW1233/RefDBC</uri
Article
Recent learned image compression models surpass manually designed methods in rate-distortion performance by introducing nonlinear transforms and end-to-end optimization. However, there still lack quantitative measurements that efficiently evaluate the latent representations inferred by learned image compression models. To address this problem, we develop novel measurements on robustness and importance of the latent representations. We first propose an admissible range that can be efficiently estimated via gradient ascent and descent for establishing the empirical distribution of latent representations. Consequently, the in-distribution region within the admissible range is derived to measure the robustness and channel importance of latent representations of natural images. Visualization demonstrates the statistics of latent representations are significantly distinguishing in the properties of robustness and linearity within and outside the in-distribution region. To our best knowledge, this paper proposes the first statistically meaningful measurements for learned image compression and successfully applies the measurements in corruption alleviation during successive image compression and post-training pruning in a training-free fashion. Compared with existing methods, the shrunk in-distribution constraint derived from the in-distribution region achieves superior robustness and rate-distortion performance in successive compression. The channel importance allows post-training pruning to achieve comparable rate-distortion performance with a reduction of up to 60% entropy coding time.
Article
This paper proposes a novel data-driven approach to designing orthonormal transform matrix codebooks for adaptive transform coding of any non-stationary vector processes which can be considered locally stationary. Our algorithm, which belongs to the class of block-coordinate descent algorithms, relies on simple probability models such as Gaussian or Laplacian for transform coefficients to directly minimize with respect to the orthonormal transform matrix the mean square error (MSE) of scalar quantization and entropy coding of transform coefficients. A difficulty commonly encountered in such minimization problems is imposing the orthonormality constraint on the matrix solution. We get around this difficulty by mapping the constrained problem in Euclidean space to an unconstrained problem on the Stiefel manifold and leveraging known algorithms for unconstrained optimization on manifolds. While the basic design algorithm directly applies to non-separable transforms, an extension to separable transforms is also proposed. We present experimental results for adaptive transform coding of still images and video inter-frame prediction residuals, comparing the transforms designed using the proposed method and a number of other content-adaptive transforms recently reported in the literature.
Article
Increasingly, analytics such as classification and detection suffer from a significant amount of generated visual data. Nonetheless, recent approaches have not given substantial thought to CAD systems within limited capacities at the expense of performance. For that purpose, we proposed an autoencoder-based classification approach for pneumonia recognition, extending the use of the features extracted by autoencoders for compression to enhance efficiency. Thus, we substitute the classification of images with compressed sequences and encoded tensors, representing a more convenient format for managing and storing data. Which significantly minimizes computing costs and enhances transmission bandwidth. We designed CNN model introducing the attention mechanism to the latent space with minimum parameters optimizing the classification complexity. We assess our method's effectiveness on two medical imaging datasets. In addition, we compared latent space classification to multi-resolution image classification performance. Our approach improves state-of-art performance, boosting efficiency; the number of parameters is negligible, reduced by 69%.
Preprint
Full-text available
Effective Adaptive BitRate (ABR) algorithm or policy is of paramount importance for Real-Time Video Communication (RTVC) amid this pandemic to pursue uncompromised quality of experience (QoE). Existing ABR methods mainly separate the network bandwidth estimation and video encoder control, and fine-tune video bitrate towards estimated bandwidth, assuming the maximization of bandwidth utilization yields the optimal QoE. However, the QoE of a RTVC system is jointly determined by the quality of compressed video, fluency of video playback, and interaction delay. Solely maximizing the bandwidth utilization without comprehensively considering compound impacts incurred by both network and video application layers, does not assure the satisfactory QoE. And the decoupling of network and video layer further exacerbates the user experience due to network-codec incoordination. This work therefore proposes the Palette, a reinforcement learning based ABR scheme that unifies the processing of network and video application layers to directly maximize the QoE formulated as the weighted function of video quality, stalling rate and delay. To this aim, a cross-layer optimization is proposed to derive fine-grained compression factor of upcoming frame(s) using cross-layer observations like network conditions, video encoding parameters, and video content complexity. As a result, Palette manages to resolve the network-codec incoordination and to best catch up with the network fluctuation. Compared with state-of-the-art schemes in real-world tests, Palette not only reduces 3.1\%-46.3\% of the stalling rate, 20.2\%-50.8\% of the delay, but also improves 0.2\%-7.2\% of the video quality with comparable bandwidth consumption, under a variety of application scenarios.
Article
Full-text available
In this article we provide an overview of rate-distortion (R-D) based optimization techniques and their practical application to image and video coding. We begin with a short discussion of classical rate-distortion theory and then we show how in many practical coding scenarios, such as in standards-compliant coding environments, resource allocation can be put in an R-D framework. We then introduce two popular techniques for resource allocation, namely, Lagrangian optimization and dynamic programming. After a discussion of these techniques as well as some of their extensions, we conclude with a quick review of literature in these areas citing a number of applications related to image and video compression and transmission
Article
Full-text available
This paper studies the Coxeter–Todd lattice its automorphism group (which is Mitchell's reflection group 6·PSU(4, 3)·2), and the associated 12-dimensional real lattice K12. We give several constructions for , which is a Z[ω]-lattice where ω = e2πi/3; enumerate the congruence classes of and where θ = ω − ω¯; prove the lattice is unique; determine its covering radius and deep holes; and study its connections with the lattice E6 and the Leech lattice. A number of new dense lattices in dimensions up to about 107 are constructed. We also give an explicit basis for the invariants of the Mitchell group. The paper concludes with an extensive bibliography.(Received January 04 1983)
Book
Full-text available
2nd Ed Bibliogr. s. 573-656
Conference Paper
Full-text available
Generalized multiple description coding (GMDC) is source coding for multiple channels such that a decoder which receives an arbitrary subset of the channels may produce a useful reconstruction. This paper reports on applications of two recently proposed methods for GMDC to image coding. The first produces statistically correlated streams such that lost streams can be estimated from the received data. The second uses quantized frame expansions and hence is conceptually similar to block channel coding, except it is done prior to quantization
Article
Full-text available
This article focuses on the compressed representations of pictures. The representation does not affect how many bits get from the Web server to the laptop, but it determines the usefulness of the bits that arrive. Many different representations are possible, and there is more involved in their choice than merely selecting a compression ratio. The techniques presented represent a single information source with several chunks of data (“descriptions”) so that the source can be approximated from any subset of the chunks. By allowing image reconstruction to continue even after a packet is lost, this type of representation can prevent a Web browser from becoming dormant
Article
Full-text available
One of the aims of the standardization committee has been the development of Part I, which could be used on a royalty- and fee-free basis. This is important for the standard to become widely accepted. The standardization process, which is coordinated by the JTCI/SC29/WG1 of the ISO/IEC has already produced the international standard (IS) for Part I. In this article the structure of Part I of the JPFG 2000 standard is presented and performance comparisons with established standards are reported. This article is intended to serve as a tutorial for the JPEG 2000 standard. The main application areas and their requirements are given. The architecture of the standard follows with the description of the tiling, multicomponent transformations, wavelet transforms, quantization and entropy coding. Some of the most significant features of the standard are presented, such as region-of-interest coding, scalability, visual weighting, error resilience and file format aspects. Finally, some comparative results are reported and the future parts of the standard are discussed
Chapter
There is no better way to quantize a single vector than to use VQ with a codebook that is optimal for the probability distribution describing the random vector. However, direct use of VQ suffers from a serious complexity barrier that greatly limits its practical use as a complete and self-contained coding technique
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Conference Paper
Conference Paper
Karhunen-Loeve transforms (KLT's) are the optimal orthogonal transforms for transform coding of Gaussian sources. This well-known fact is usually established with approximations from high-resolution quantization theory. How high does the rate have to be for these approximations to be accurate? The minimum rate allocated to any component should be at least about one bit. (The average rate per component may be much higher.) Does the rate actually have to be high for the KLT to be optimal? No, the KLT is optimal more generally. Two new, simple proofs of this fact are described. They rely on a scale invariance property, but not on high-resolution approximations or properties of optimal fixed-rate quantization
Article
A program of active research on the vocoder was initiated in July 1934. The first public demonstration of the vocoder was at the Harvard Tercentenary in September 1936. Since then, there have been studies of the underlying nature of speech and the recognition of speech‐sound elements in transforming from the spoken to the written word or setting up other automatic voice operation. There has also been much research directed toward straightforward bandwidth compression and such compression as an aid to the attainment of other goals. The entire field of research in speech analysis and/or synthesis is reviewed, including a discussion of limitations and requirements. A demonstration tape is played from an improved 2400‐bit/sec digitized vocoder.
Article
A bit allocation algorithm that is capable of efficiently allocating a given quota of bits to an arbitrary set of different quantizers is proposed. This algorithm is useful in any coding scheme which uses bit allocation or, more generally, codebook allocation. It produces an optimal or very nearly optimal allocation, while allowing the set of admissible bit allocation values to be constrained to nonnegative integers. It is particularly useful in cases where the quantizer performance versus rate is irregular and changing in time, a situation that cannot be handled by conventional allocation algorithms
Chapter
Rate-distortion theory is the branch of information theory that treats compressing the data produced by an information source down to a specified encoding rate that is strictly less than the source's entropy. This necessarily entails some lossiness, or distortion, between the original source data and the best approximation thereto that can be produced on the basis of the encoder's output bits. Rate-distortion theory was introduced in the seminal works written in 1948 and 1959 by C. E. Shannon, the founder of information theory. We describe Shannon's contribution and then trace its subsequent development worldwide. Heavier than usual emphasis is placed on the concept of “matching” a channel to a source in the rate-distortion sense, and also on the analogous matching of a source to a channel. Experimental evidence has been mounting in support of the hypothesis that living organisms often simultaneously achieve both of these matchings when processing their sensory inputs, thereby eliminating the need for the complex encoding and decoding operations that are needed in order to produce an information-theoretically optimum system in the absence of such double matching. Keywords: rate-distortion; lossy source coding; distortion measure; Shannon; joint source-channel coding; bioinformation theory
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Article
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
Chapter
Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: “The simplest explanation is best”) and to probability and statistics (error rates for optimal hypothesis testing and estimation). The relationship of information theory to other fields is discussed. Information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We describe these areas of intersection in detail.
Article
The paper analyzes a procedure for quantizing blocks of N correlated Gaussian random variables. A linear transformation (P) first converts the N dependent random variables into N independent random variables. These are then quantized, one at a time, in optimal fashion. The output of each quantizer is transmitted by a binary code. The total number of binary digits available for the block of N symbols is fixed. Finally, a second N times N linear transformation (R) constructs from the quantized values the best estimate (in a mean-square sense) of the original variables. It is shown that the best choice of R is R = p^{-1} , regardless of other considerations. If R = P^{-1} , the best choice for P is the transpose of the orthogonal matrix wich diagonalizes the moment matrix of the original (correlated) random variables. An approximate expression is obtained for the manner in which the available binary digits should be assigned to the N quantized variables, i.e., the manner in which the number of levels for each quantizer should be chosen. The final selection of the optimal set of quantizers then becomes a matter of a few simple trials. A number of examples are worked out and substantial improvements over single sample quantizing are attained with blocks of relatively short length.
Article
A new family of unitary transforms is introduced. It is shown that the well-known discrete Fourier, cosine, sine, and the Karhunen-Loeve (KL) (for first-order stationary Markov processes) transforms are members of this family. All the member transforms of this family are sinusoidal sequences that are asymptotically equivalent. For finite-length data, these transforms provide different approximations to the KL transform of the said data. From the theory of these transforms some well-known facts about orthogonal transforms are easily explained and some widely misunderstood concepts are brought to light. For example, the near-optimal behavior of the even discrete cosine transform to the KL transform of first-order Markov processes is explained and, at the same time, it is shown that this transform is not always such a good (or near-optimal) approximation to the above-mentioned KL transform. It is also shown that each member of the sinusoidal family is the KL transform of a unique, first-order, non-stationary (in general), Markov process. Asymptotic equivalence and other interesting properties of these transforms can be studied by analyzing the underlying Markov processes.
Article
Thesis (Ph. D.)--University of Michigan, 1992. Includes bibliographical references. Photocopy. s
Article
The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms.
Article
A coding scheme is described for the transmission of n continuous correlated signals over m channels, m being equal to or less than n . Each of the m signals is a linear combination of the n original signals. The coefficients of this linear transformation, which constitute an m times n matrix, are constants of the coding scheme. For the purpose of decoding, the m signals are once more combined linearly into n output signals which approximate the input signals. The coefficients of the coding matrix which minimize the sum of the mean square differences between the original signals and the reconstructed ones are shown to be the components of the eigenvectors of the matrix of the correlation coefficients of the original signals. The decoding matrix is the transpose of the coding matrix. As an example, the coding scheme is applied to a channel vocoder in which speech is transmitted by means of a set of signals proportional to the speech energy in the various frequency bands. These signals are strongly correlated, and the coding results in a substantial reduction in the number of signals necessary to transmit highly articulate speech. The coding theory can be extended to include the minimization of the expectation of any positive definite quadratic function of the differences between the original and reconstructed signals. In addition, if the signals are Gaussian, the sum of the channel capacities necessary to transmit the transformed signals is shown to be equal to or less than that necessary to transmit the original signals.
Article
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
Article
The paper presents several algorithms for designing fixed- and variable-depth decision trees for searching vector quantizer (VQ) codebooks. Two applications of such are explored. First, given a source vector, a tree can be used to find the closest codevector in the VQ codebook with many fewer arithmetic operations than the usual “full search.” This decrease in complexity comes at the expense of an increase in auxiliary table storage. Second, the tree can be used as the first stage of fine-coarse vector quantization, which yields further savings in complexity at the cost of somewhat more storage and a small increase in distortion. The design methods involve incrementally growing trees with a variety of node splitting criteria and, subsequently, optimally pruning trees on the basis of performance functionals such as distortion, storage, and computational complexity. The pruning is done with the BFOS algorithm, which optimally trades one performance functional with another, and with an extension of the BFOS algorithm wherein one performance measure is traded with a combination of two others. The results of applying these methods to i.i.d. Gaussian, Gauss-Markov, and sampled speech sources at encoding rates of one and two bits per source sample demonstrate the tradeoffs achievable amongst time (complexity), memory (storage), and distortion
Article
One of the purposes of this article is to give a general audience sufficient background into the details and techniques of wavelet coding to better understand the JPEG 2000 standard. The focus is on the fundamental principles of wavelet coding and not the actual standard itself. Some of the confusing design choices made in wavelet coders are explained. There are two types of filter choices: orthogonal and biorthogonal. Orthogonal filters have the property that there are energy or norm preserving. Nevertheless, modern wavelet coders use biorthogonal filters which do not preserve energy. Reasons for these specific design choices are explained. Another purpose of this article is to compare and contrast “early” wavelet coding with “modern” wavelet coding. This article compares the techniques of the modern wavelet coders to the subband coding techniques so that the reader can appreciate how different modern wavelet coding is from early wavelet coding. It discusses basic properties of the wavelet transform which are pertinent to image compression. It builds on the background material in generic transform coding given, shows that boundary effects motivate the use of biorthogonal wavelets, and introduces the symmetric wavelet transform. Subband coding or “early” wavelet coding method is discussed followed by an explanation of the EZW coding algorithm. Other modern wavelet coders that extend the ideas found in the EZW algorithm are also described
Article
Over the last decade or so, wavelets have had a growing impact on signal processing theory and practice, both because of the unifying role and their successes in applications. Filter banks, which lie at the heart of wavelet-based algorithms, have become standard signal processing operators, used routinely in applications ranging from compression to modems. The contributions of wavelets have often been in the subtle interplay between discrete-time and continuous-time signal processing. The purpose of this article is to look at wavelet advances from a signal processing perspective. In particular, approximation results are reviewed, and the implication on compression algorithms is discussed. New constructions and open problems are also addressed
Article
The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms
Article
A bit allocation algorithm that is capable of efficiently allocating a given quota of bits to an arbitrary set of different quantizers is proposed. This algorithm is useful in any coding scheme which uses bit allocation or, more generally, codebook allocation. It produces an optimal or very nearly optimal allocation, while allowing the set of admissible bit allocation values to be constrained to nonnegative integers. It is particularly useful in cases where the quantizer performance versus rate is irregular and changing in time, a situation that cannot be handled by conventional allocation algorithms
Article
The performance of optimum quantizers subject to an entropy constraint is studied for a wide class of memoryless sources. For a general distortion criterion, necessary conditions are developed for optimality and a recursive algorithm is described for obtaining the optimum quantizer. Under a mean-square error criterion, the performance of entropy encoded uniform quantization of memoryless Gaussian sources is well-known to be within 0.255 bits/sample of the rate-distortion bound at relatively high rates. Despite claims to the contrary, it is demonstrated that similar performance can be expected for a wide range of memoryless sources. Indeed, for the cases considered, the worst case performance is observed to be less than 0.3 bits/sample from the rate-distortion bound, and in most cases this disparity is less at Iow rates.
Article
In 1948 W. R. Bennett used a companding model for nonuniform quantization and proposed the formula D : = : frac{1}{12N^{2}} : int : p(x) [ É(x) ]^{-2} dx for the mean-square quantizing error where N is the number of levels, p (x) is the probability density of the input, and E prime (x) is the slope of the compressor curve. The formula, an approximation based on the assumption that the number of levels is large and overload distortion is negligible, is a useful tool for analytical studies of quantization. This paper gives a heuristic argument generalizing Bennett's formula to block quantization where a vector of random variables is quantized. The approach is again based on the asymptotic situation where N , the number of quantized output vectors, is very large. Using the resulting heuristic formula, an optimization is performed leading to an expression for the minimum quantizing noise attainable for any block quantizer of a given block size k . The results are consistent with Zador's results and specialize to known results for the one- and two-dimensional cases and for the case of infinite block length (k rightarrow infty) . The same heuristic approach also gives an alternate derivation of a bound of Elias for multidimensional quantization. Our approach leads to a rigorous method for obtaining upper bounds on the minimum distortion for block quantizers. In particular, for k = 3 we give a tight upper bound that may in fact be exact. The idea of representing a block quantizer by a block "compressor" mapping followed with an optimal quantizer for uniformly distributed random vectors is also explored. It is not always possible to represent an optimal quantizer with this block companding model.
Article
This paper considers the problem of efficient transmission of vector sources over a digital noiseless channel. It treats the problem of optimal allocation of the total number of available bits to the components of a memoryless stationary vector source with independent components. This allocation is applied to various encoding schemes, such as minimum mean-square error, sample-by-sample quantization, or entropy quantization. We also give the optimally decorrelating scheme for a source whose components are dependent and treat the problems of selecting the optimum characteristic of the encoding scheme such that the overall mean-squared error is minimized. Several examples of encoding schemes, including the ideal encoder that achieves the rated istortion bound, and of sources related to a practical problem are discussed.
Article
It is shown how the Karhunen-Loève (K-L) series representation for a finite sample of a discrete random sequence, stationary to the second order, may be further decomposed into a pair of series by utilizing certain symmetry properties of the covariance matrix of the sequence. The theory is applied to the particular example of a first-order Markov sequence, the series representation of which has not so far been reported in the literature. The generalization to the case of continuous random functions on a finite interval is similar and is therefore only briefly described.