ArticlePDF Available

Redundancy of Universal Coding, Kolmogorov Complexity, and Hausdorff Dimension

December 2004
IEEE Transactions on Information Theory 50(11):2727 - 2736

December 2004
50(11):2727 - 2736

DOI:10.1109/TIT.2004.836663

Source
IEEE Xplore

Authors:

Hayato Takahashi

Random Data Lab. Inc.

We study asymptotic code lengths of universal codes for parametric models. We show a universal code whose code length is asymptotically less than or equal to that of the minimum description length (MDL) code. Especially when some of the parameters of a source are not random reals, the coefficient of the logarithm in the formula of our universal code is less than that of the MDL code. We describe the redundancy in terms of Kolmogorov complexity and Hausdorff dimension. We show that our universal code is asymptotically optimal in the sense that the coefficient of the logarithm in the formula of the code length is minimal. Our universal code can be considered to be a natural extension of the Shannon code and the MDL code.

Content uploaded by Hayato Takahashi

Content may be subject to copyright.

Redundancy of universal coding, Kolmogorov

complexity, and Hausdorﬀ dimension (Abstract)

Hayato Takahashi ∗

January 28, 2003

In [1, 4, 5], under a suitable condition, it is shown that asymptotic code-

lengths of sequences generated by a parametric model Pθis given as follows;

−log Pˆ

θ+k

2log n+o(log n), Pθ−a.e., (1)

where ˆ

θis the maximum-likelihood estimator, kis the dimension of parameter

space, nis the sample size, and the base of log is 2.

In view of the proof of Rissanen [4], the second term of (1) is the description

of the maximum likelihood estimator ˆ

θwith (log n)/2 bit accuracy, therefore, it

is natural to study a universal coding obtained by compressing the description of

the maximum likelihood estimator. In fact, Vovk [6] studied a universal coding

for Bernoulli model with code-length

inf

θ−log Pθ+K(θ|n),(2)

where θranges over computable real, and Kis the preﬁx Kolmogorov complexity

[2, 3].

In order to study the code (2), we study asymptotic expansion of Bayes

mixture RPθdm(θ) with two kind of priors. One is a prior that is singular with

respect to Lebesgue measure, and another is a priori probability on Euclidean

space.

By considering prior of Bayes mixture to be a priori probability on Euclidean

space, we extend the universal coding (2) to multidimensional parameter space,

and show a universal coding whose code-length is

−log Pˆ

θ+

j=1

K(description of θjup-to (log n)/2 bit |n)+O(log log n), Pθ−a.e.,

(3)

where θ= (θ1,· · · , θk). On the other hand Rissanen [5] showed that the code-

length (1) is optimal up to O(log n) term except for parameters in a set of

∗The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-

8569, Japan, tel: 81-3-3446-1501(ext. 9701), fax: 81-3-5421-8750, E-mail: takahasi@ism.ac.jp

Lebesgue measure 0. Therefore we characterize the parameter set such that

j=1 K(description of θjup-to (log n)/2 bit |n)

2log n<1

(n→ ∞) with Hausdorﬀ dimension. Consequently we show a universal coding

having the following property: For each real numbers hj,0≤hj≤1,1≤j≤k,

there are subsets of parameter space, H1× · ·· × Hksuch that if θj∈Hj, the

code-length is

−log Pˆ

θ+(Pk

j=1 dim Hj)

2log n+o(log n), Pθ−a.e., (4)

where dim His the Hausdorﬀ dimension of H. Also, we show that the code-

length optimal up-to O(log n) term when the parameter space is unit interval.

Since the code-length of the universal coding (2) and (3) involves Kolmogorov

complexity, we can not construct such the code eﬀectively. To avoid this dif-

ﬁculty, we approximate Kolmogorov complexity in (2) and (3), by considering

Bayes mixture with singular prior with respect to Lebesgue measure. Then we

show a universal coding, which is constructive, such that the code-length is

−log Pˆ

θ+h

2log n+o(log n), Pθ−a.e., (5)

where h=−plog p−(1−p) log(1 −p), and pis the relative frequency of 1 in the

dyadic expansion of ˆ

θ. Note that h < 1⇔p6= 1/2, i.e. the relative frequency of

1 in the dyadic expansion of ˆ

θis biased then code-length (5) is asymptotically

less than that of MDL coding. Also we show that the code (5) is optimal up-to

O(log n) term for almost every θwith respect to the prior.

Finally we remark that the code-lengths shown in this paper give a non-

trivial upper and lower bound of Kolmogorov complexity when the source is not

a computable measure.

References

[1] A. R. Barron. Logically smooth density estimation. Ph.D. dissertation, Dept. Elec. Eng.,

Stanford Univ., Stanford, CA, Sept. 1985.

[2] G. J. Chaitin. A theory of program size formally identical to information theory. J. ACM,

22:329–340, 1975.

[3] L. A. Levin. Laws of information conservation (nongrowth) and aspects of the foundation

of probability theory. Prob. Inf. Transm., 10:206–210, 1974.

[4] J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans.

Inform. Theory, IT-30(4):629–636, 1984.

[5] J. Rissanen. Stochastic complexity and modeling. Ann. Statist., 14(3):1080–1100, 1986.

[6] V. G. Vovk. Learning about the parameter of the Bernoulli model. J. Comput. System

Sci., 55:96–104, 1997.

Some problems of algorithmic randomness on product space (The 8th Workshop on Stochastic Numerics)

Article

Jan 2009

Hayato Takahashi

Algorithmic randomness and monotone complexity on product space

Article

Oct 2009

Hayato Takahashi

We study algorithmic randomness and monotone complexity on product of the set of infinite binary sequences. We explore the following problems: monotone complexity on product space, Lambalgen's theorem for correlated probability, classification of random sets by likelihood ratio tests, decomposition of complexity and independence, Bayesian statistics for individual random sequences. Formerly Lambalgen's theorem for correlated probability is shown under a uniform computability assumption in [H. Takahashi Inform. Comp. 2008]. In this paper we show the theorem without the assumption.

Aspects of Kolmogorov Complexity the Physics of Information

Book

Sep 2022

Bradley S. Tice

Cellular Neural Networks with dynamic cell activity control for Hausdorff distance estimation

Conference Paper

Aug 2012

A concept of Cellular Neural Networks with dynamic cell activity control is proposed in the paper. The concept is an extension to the Fixed State Map mechanism and it assumes that cells can be disabled or enabled for processing based on assessment of current distributions of their neighboring signals. A particular case, where this assessment is made by thresholding a result of cross-correlation between feedback template and neighborhood outputs is shown to provide a simple means for efficient min/max problem handling. This idea requires introducing only minor modifications to a cell structure. As an example, application of the proposed network for fast estimation of Hausdorff distance between two sets has been considered.

Bayesian approach to a definition of random sequences and its applications to statistical inference

Article

Jul 2006

Hayato Takahashi

We introduce a universal Bayes test, which is a Bayesian version of Martin-Lof test. Then we define random sequences with respect to parametric models based on our universal Bayes test. We state some theorems related to Bayesian statistical inference in terms of random sequence

An Introduction to Kolmogorov Complexity and Its Applications

Book

Full-text available

Jan 1997

Twice-universal coding

Article

Full-text available

Jul 1984

Boris Ryabko

Assume that A is a finite alphabet; OMEGA //i is a set of Markov sources of connectedness i that generate letters from A(i equals 1, 2, . . . ); and OMEGA //0 is a set of Bernoulli sources. A code is proposed whose redundancy as a function of the block length on each OMEGA //i is asymptotically as small as that of the universal code that is optimal on OMEGA //i (i equals 0, 1, 2,. . . ). A generalization of this problem to the case of an arbitrary countable family of sets of stationary ergodic sources is considered.

Entropy and the complexity of the trajectories of a dynamic system

Article

Jan 1983

Alexey Brudno

Ergodic Theory and Information

Article

Dec 1967

Noiseless coding of combinatorial sources, Hausdorff dimension, and Kolmogorov complexity

Article

Jul 1986

Boris Ryabko

Universal Compression and Retrieval

Book

Jan 1994

Rafail Krichevsky

Optimal coding in the case of unknown and changing message statistics

Article

B.M. Fitingof

Laws of Information Conservation (Nongrowth) and Aspects of the Foundation of Probability Theory

Article

Jul 1974

L. A. Levin

A new alternative definition is given for the algorithmic quantity of information defined by A. N. Kolmogorov. The nongrowth of this quantity is proved for random and certain other processes.

A formal theory of inductive inference. I

Article

Jun 1964
Inform Contr

R. J. Solomonoff

In Part I, four ostensibly different theoretical models of induction are presented, in which the problem dealt with is the extrapolation of a very long sequence of symbols—presumably containing all of the information to be used in the induction. Almost all, if not all problems in induction can be put in this form. Some strong heuristic arguments have been obtained for the equivalence of the last three models. One of these models is equivalent to a Bayes formulation, in which a priori probabilities are assigned to sequences of symbols on the basis of the lengths of inputs to a universal Turing machine that are required to produce the sequence of interest as output. Though it seems likely, it is not certain whether the first of the four models is equivalent to the other three. Few rigorous results are presented. Informal investigations are made of the properties of these models. There are discussions of their consistency and meaningfulness, of their degree of independence of the exact nature of the Turing machine used, and of the accuracy of their predictions in comparison to those of other induction methods. In Part II these models are applied to the solution of three problems—prediction of the Bernoulli sequence, extrapolation of a certain kind of Markov chain, and the use of phrase structure grammars for induction. Though some approximations are used, the first of these problems is treated most rigorously. The result is Laplace's rule of succession. The solution to the second problem uses less certain approximations, but the properties of the solution that are discussed, are fairly independent of these approximations. The third application, using phrase structure grammars, is least exact of the three. First a formal solution is presented. Though it appears to have certain deficiencies, it is hoped that presentation of this admittedly inadequate model will suggest acceptable improvements in it. This formal solution is then applied in an approximate way to the determination of the “optimum” phrase structure grammar for a given set of strings. The results that are obtained are plausible, but subject to the uncertainties of the approximation used.

The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms

Article

Oct 2007

In 1964 Kolmogorov introduced the concept of the complexity of a finite object (for instance, the words in a certain alphabet). He defined complexity as the minimum number of binary signs containing all the information about a given object that are sufficient for its recovery (decoding). This definition depends essentially on the method of decoding. However, by means of the general theory of algorithms, Kolmogorov was able to give an invariant (universal) definition of complexity. Related concepts were investigated by Solomonoff (U.S.A.) and Markov. Using the concept of complexity, Kolmogorov gave definitions of the quantity of information in finite objects and of the concept of a random sequence (which was then defined more precisely by Martin-Löf). Afterwards, this circle of questions developed rapidly. In particular, an interesting development took place of the ideas of Markov on the application of the concept of complexity to the study of quantitative questions in the theory of algorithms. The present article is a survey of the fundamental results connected with the brief remarks above.

Redundancy of Universal Coding, Kolmogorov Complexity, and Hausdorff Dimension

Abstract

Recommended publications

Sequential Probability Assignment with Binary Alphabets and Large Classes of Experts

Periodogram-Based Estimators of Fractal Properties

Sequential Decoding for Discrete Input Memoryless Channels

Hierarchical universal coding