Feeds:
Posts

## On the two notions of “Information”

Recently I was going through Shannon’s original 1948 Information Theory paper and a paper by Kolmogorov from 1983 that places the differences between “Shannon Information” and “Algorithmic Information” in sharp relief. After much information diffusion over the past decades the difference is quite obvious and particularly interesting to contrast nonetheless. Nevertheless I found these two paragraphs from these two papers interesting, if for nothing then for historical reasons.

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.” (C. E. Shannon, A Mathematical Theory of Communication, 1948).
“Our definition of the quantity of information has the advantage that it refers to individual objects and not to objects treated as members of a set of objects with a probability distribution given on it. The probabilistic definition can be convincingly applied to the information contained, for example, in a stream of congratulatory telegrams. But it would not be clear how to apply it, for example, to an estimate of the quantity of information contained in a novel or in the translation of a novel into another language relative to the original. I think that the new definition is capable of introducing in similar applications of the theory at least clarity of principle.” (A. N. Kolmogorov, Combinatorial Foundations of Information Theory and the Calculus of Probabilities, 1983).
Note the reference that Shannon makes is to a message selected from a set of possible messages (and hence the use of probability theory), his problem was mainly concerned with the engineering application of communication. While Kolmogorov talks of individual objects and is not directly concerned with communication and hence the usage of computability (the two notions are related ofcourse, for example the expected Kolmogorov Complexity is equal to Shannon entropy).
2. Let’s assume that the source words $x$ are distributed as a random variable $X$ with the probability $P(X = x) = p(x)$. Note that the Kolmogorov Complexity $K(x)$ is fixed for each word and is independent of the random variable $X$. Then we look at the following quantity $\sum_x p(x) K(x)$, this is the expected Kolmogorov Complexity. One might wonder how this would relate to the minimal average codeword length, i.e. the entropy $H(x) = \sum_x p(x) log(\frac{1}{p(x)})$. The answer is that they are indeed equal under some mild assumptions under the distribution in question. In the following sense:
$0 \leq \sum_x p(x) K(x) - H(x) \leq K(p) + O(1)$.