Content uploaded by Hayato Takahashi
Author content
All content in this area was uploaded by Hayato Takahashi on Nov 02, 2018
Content may be subject to copyright.
Redundancy of universal coding, Kolmogorov
complexity, and Hausdorff dimension (Abstract)
Hayato Takahashi ∗
January 28, 2003
In [1, 4, 5], under a suitable condition, it is shown that asymptotic code-
lengths of sequences generated by a parametric model Pθis given as follows;
−log Pˆ
θ+k
2log n+o(log n), Pθ−a.e., (1)
where ˆ
θis the maximum-likelihood estimator, kis the dimension of parameter
space, nis the sample size, and the base of log is 2.
In view of the proof of Rissanen [4], the second term of (1) is the description
of the maximum likelihood estimator ˆ
θwith (log n)/2 bit accuracy, therefore, it
is natural to study a universal coding obtained by compressing the description of
the maximum likelihood estimator. In fact, Vovk [6] studied a universal coding
for Bernoulli model with code-length
inf
θ−log Pθ+K(θ|n),(2)
where θranges over computable real, and Kis the prefix Kolmogorov complexity
[2, 3].
In order to study the code (2), we study asymptotic expansion of Bayes
mixture RPθdm(θ) with two kind of priors. One is a prior that is singular with
respect to Lebesgue measure, and another is a priori probability on Euclidean
space.
By considering prior of Bayes mixture to be a priori probability on Euclidean
space, we extend the universal coding (2) to multidimensional parameter space,
and show a universal coding whose code-length is
−log Pˆ
θ+
k
X
j=1
K(description of θjup-to (log n)/2 bit |n)+O(log log n), Pθ−a.e.,
(3)
where θ= (θ1,· · · , θk). On the other hand Rissanen [5] showed that the code-
length (1) is optimal up to O(log n) term except for parameters in a set of
∗The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-
8569, Japan, tel: 81-3-3446-1501(ext. 9701), fax: 81-3-5421-8750, E-mail: takahasi@ism.ac.jp
1
Lebesgue measure 0. Therefore we characterize the parameter set such that
Pk
j=1 K(description of θjup-to (log n)/2 bit |n)
k
2log n<1
(n→ ∞) with Hausdorff dimension. Consequently we show a universal coding
having the following property: For each real numbers hj,0≤hj≤1,1≤j≤k,
there are subsets of parameter space, H1× · ·· × Hksuch that if θj∈Hj, the
code-length is
−log Pˆ
θ+(Pk
j=1 dim Hj)
2log n+o(log n), Pθ−a.e., (4)
where dim His the Hausdorff dimension of H. Also, we show that the code-
length optimal up-to O(log n) term when the parameter space is unit interval.
Since the code-length of the universal coding (2) and (3) involves Kolmogorov
complexity, we can not construct such the code effectively. To avoid this dif-
ficulty, we approximate Kolmogorov complexity in (2) and (3), by considering
Bayes mixture with singular prior with respect to Lebesgue measure. Then we
show a universal coding, which is constructive, such that the code-length is
−log Pˆ
θ+h
2log n+o(log n), Pθ−a.e., (5)
where h=−plog p−(1−p) log(1 −p), and pis the relative frequency of 1 in the
dyadic expansion of ˆ
θ. Note that h < 1⇔p6= 1/2, i.e. the relative frequency of
1 in the dyadic expansion of ˆ
θis biased then code-length (5) is asymptotically
less than that of MDL coding. Also we show that the code (5) is optimal up-to
O(log n) term for almost every θwith respect to the prior.
Finally we remark that the code-lengths shown in this paper give a non-
trivial upper and lower bound of Kolmogorov complexity when the source is not
a computable measure.
References
[1] A. R. Barron. Logically smooth density estimation. Ph.D. dissertation, Dept. Elec. Eng.,
Stanford Univ., Stanford, CA, Sept. 1985.
[2] G. J. Chaitin. A theory of program size formally identical to information theory. J. ACM,
22:329–340, 1975.
[3] L. A. Levin. Laws of information conservation (nongrowth) and aspects of the foundation
of probability theory. Prob. Inf. Transm., 10:206–210, 1974.
[4] J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans.
Inform. Theory, IT-30(4):629–636, 1984.
[5] J. Rissanen. Stochastic complexity and modeling. Ann. Statist., 14(3):1080–1100, 1986.
[6] V. G. Vovk. Learning about the parameter of the Bernoulli model. J. Comput. System
Sci., 55:96–104, 1997.
2