Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a “good model” is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad (non-stochastic) data appear “in real life”?
Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve.
Road-map: Sect. 2 considers the notion of \((\alpha ,\beta )\)-stochasticity; Sect. 3 considers two-part descriptions and the so-called “minimal description length principle”; Sect. 4 gives one more approach: we consider the list of objects of bounded complexity and measure how far some object is from the end of the list, getting some natural class of “standard descriptions” as a by-product; finally, Sect. 5 establishes a connection between these notions and resource-bounded complexity. The rest of the paper deals with an attempts to make theory close to practice by considering restricted classes of descriptions (Sect. 6) and strong models (Sect. 7).
In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory. An exposition can be found in [42, Chaps. 1, 3, 4] or [22, Chaps. 2, 3], see also the survey [36].
A short survey of main results of algorithmic statistics was given in [41] (without proofs); see also the last chapter of the book [42].
The work was in part funded by RFBR according to the research project grant 16-01-00362-a (N.V.) and by RaCAF ANR-15-CE40-0016-01 grant (A.S.).
We do not go into details here, but let us mention one common misunderstanding: the set of programs should be prefix-free for each c, but these sets may differ for different c and the union is not required to be prefix-free.
Initially Kolmogorov suggested to consider \(n -\mathrm {C}(x)\) as “randomness deficiency” in this case, where \(\mathrm {C}\) stands for the plain (not prefix) complexity. One may also consider \(n-\mathrm {C}(x| n)\). But all three deficiency functions mentioned are close to each other for strings x of length n; one can show that the difference between them is bounded by \(O(\log d)\) where d is any of these three functions. The proof works by comparing the expectation and probability-bounded characterizations as explained in [9].
This notation may look strange; however, we speak so often about finite sets of complexity at most i and cardinality at most \(2^j\) that we decided to introduce some short name and notation for them.
Technically speaking, this holds only for \(\alpha \leqslant \mathrm {K}(x)\). For \(\alpha >\mathrm {K}(x)\) both sets contain all pairs with first component \(\alpha \).
This number depends on the choice of the prefix decompressor, so it is not a specific number but a class of numbers. The elements of this class can be equivalently characterized as random lower semicomputable reals in [0, 1], see [42, Sect. 5.7].
In general, if two sets X and Y in \(\mathbb {N}^2\) are close to each other (each is contained in the small neighborhood of the other one), this does not imply that their boundaries are close. It may happen that one set has a small “hole” and the other does not, so the boundary of the first set has points that are far from the boundary of the second one. However, in our case both sets are closed by construction in two different directions, and this implies that the boundaries are also close.
This observation motivates Levin’s version of complexity (Kt, see [21, Sect. 1.3, p. 21]) where the program size and logarithm of the computation time are added: linear overhead in computation time matches the constant overhead in the program size. However, this is a different approach and we do not use the Levin’s notion of time bounded complexity in this survey.
One can also consider some class of probability distributions, but we restrict our attention to sets (uniform distributions).
Note that for the values of s close to N the right-hand side can be less than 1; the inequality then claims just the existence of non-deleted elements. The induction step is still possible: non-deleted element is contained in one of the covering sets.
Now we see why N was chosen to be \(\sqrt{n/\log n}\): the bigger N is, the more points on the curve we have, but then the number of versions of the good sets and their complexity increases, so we have some trade-off. The chosen value of n balances these two sources of errors.
The same problem appears if we observe a sequence of independent coin tossings with probability of success p, select some trials (before they are actually performed, based on the information obtained so far), and ask for the probability of the event “t first selected trials were all unsuccessful”. This probability does not exceed \((1-p)^t\); it can be smaller if the total number of selected trials is less than t with positive probability. This scheme was considered by von Mises when he defined random sequences using selection rules, so it should be familiar to algorithmic randomness people.
It is worth to mention that on the other hand, for every string x there is an almost minimal program for x that can be obtained from x by a simple total algorithm [40, Theorem 17].
In this section we omit some proofs; see the original papers and the arxiv version of this paper.
We are grateful to several people who contributed and/or carefully read preliminary versions of this survey, in particular, to B. Bauwens, P. Gács, A. Milovanov, G. Novikov, A. Romashchenko, P. Vitányi, and to all participants of Kolmogorov seminar in Moscow State University and ESCAPE group in LIRMM. We are also grateful to an anonymous referee for correcting several mistakes.
