# «On essentially conditional information inequalities Tarik Kaced∗ Andrei Romashchenko† arXiv:1103.2545v3 [cs.IT] 31 May 2011 June 1, 2011 Abstract ...»

On essentially conditional information inequalities

Tarik Kaced∗ Andrei Romashchenko†

arXiv:1103.2545v3 [cs.IT] 31 May 2011

June 1, 2011

Abstract

In 1997, Z. Zhang and R.W. Yeung found the ﬁrst example of a conditional information inequality in four variables that is not “Shannontype”. This linear inequality for entropies is called conditional (or constraint) since it holds only under condition that some linear equations are

satisﬁed for the involved entropies. Later, the same authors and other researchers discovered several unconditional information inequalities that do not follow from Shannon’s inequalities for entropy.

In this paper we show that some non Shannon-type conditional inequalities are “essentially” conditional, i.e., they cannot be extended to any unconditional inequality. We prove one new essentially conditional information inequality for Shannon’s entropy and discuss conditional information inequalities for Kolmogorov complexity.

1 Introduction Let (X1,..., Xn ) be jointly distributed random variables on a ﬁnite domain.

For this collection of random variables there are 2n − 1 non-empty subsets and for each subset we have a value of Shannon’s entropy. We call this family of entropies the entropy proﬁle of the distribution (X1,..., Xn ). Thus, to every n-tuple of jointly distributed random variables there corresponds its entropy n n proﬁle which is a vector of values in R2 −1. We say that a point in R2 −1 is constructible if it is a vector of entropies for some distribution.

All constructible points satisfy diﬀerent information inequal

published, many (in fact, inﬁnitely many) non Shannon-type linear information inequalities were proven, see, e.g., [7, 8, 9, 12, 13]. Another very curious piecewise linear information inequality was proven in [15]. These new inequalities were applied in problems of network coding [14], secret sharing [16], etc.

However, these inequalities and their ‘physical meaning’ are still not very well understood.

In this paper we discuss conditional (constraint) information inequalities.

That is, we are interested in linear information inequalities that are true only given some linear constraint for entropies. Trivial examples of conditional inequalities can be easily derived from (unconditional) basic inequalities, e.g., if H(X1 ) = 0 then H(X1, X2 ) ≤ H(X2 ). However, some conditional inequalities cannot be obtained as a corollary of Shannon-type inequalities. The ﬁrst example of a nontrivial conditional inequality was proven in [4] (even before the ﬁrst

**example of an unconditional non Shannon-type inequality):**

(for some constant κ 0). In this paper we prove that this conjecture is wrong: for any coeﬃcient κ, inequality (3) is not true for some distributions.

So, inequality (1) is “essentially conditional”; it cannot be extended to an unconditional information inequality. A similar statement can be proven for (2).

In this paper we also prove one new conditional linear inequality that cannot be extended to any unconditional inequality. So, now we have three examples of essentially conditional information inequality.

It is known that the class of unconditional linear information inequalities are the same for Shannon’s entropy and for Kolmogorov complexity. The situation with conditional inequalities is more complicated: the known technique used to prove constraint information inequalities for Shannon’s entropy cannot be directly adapted for Kolmogorov complexity. In fact, it is not even clear how to formulate Kolmogorov’s version of constraint inequalities. However, we prove for Kolmogorov complexities some counterpart of inequality (1); this inequality holds only for some special tuples of words.

The paper is organized as follows. In Section 2 we use the technique from [4] and prove one new conditional information inequality. In Section 3 we prove that this new inequality as well as (1) and (2) cannot be extended to any unconditional inequalities. In Section 4 we prove some version of conditional inequality for Kolmogorov complexities.

2 Nontrivial conditional information inequalities The very ﬁrst example of an inequality that does not follow from basic (Shannon

**type) inequalities was the following result of Z. Zhang and R. W. Yeung:**

**Theorem 1 (Zhang–Yeung, [4]). For all random variables A, B, C, D, if I(A :**

B|C) = I(A : B) = 0 then

With the same technique F. Mat´ˇ proved another conditional inequality (2), us

**see [6]. Using a similar method, we prove one new conditional inequality1 :**

Theorem 2. For all random variables A, B, C, D if

3 Conditional inequalities that cannot be extended to any unconditional inequalities In [7] it was conjectured that the conditional inequality from Theorem 1 is a

**corollary of some unconditional information inequality (which was not discovered yet):**

Conjecture 1 ([7]). For some constant κ 0 inequality (3) is true for all random variables A, B, C, D.

Obviously, if such an inequality could be proven, it would imply the statement of Theorem 1. Similar conjectures could be formulated for (2) and the conditional inequality from Theorem 2. We prove that these conjectures are false, i.e., these three conditional inequalities cannot be converted into unconditional

**inequalities:**

** Theorem 3. (a) For any κ the inequality (3) is not true for some distributions (A, B, C, D).**

(b) For any κ the inequality

is not true for some distributions (A, B, C, D). Thus, (2) cannot be extended to an unconditional inequality.

Proof. (a) For all ε ∈ [0, 1] we us consider the following joint distribution of

**binary variables (A, B, C, D):**

For each value of A and for each values of B, the value of at least one of variables C, D is uniquely determined: if A = 0 then C = 0; if A = 1 then D = 1; if

**B = 0 then D = 1; and if B = 1 then C = 0. Hence, I(C : D|A) = I(C :**

D|B) = 0. Also it is easy to see that I(A : B|C) = 0. Thus, if (3) is true, then I(C : D) ≤ κI(A : B).

**Denote the right-hand and left-hand sides of this inequality by L(ε) = I(C :**

D) and R(ε) = κI(A : B). Both functions L(ε) and R(ε) are continuous, and L(0) = R(0) = 0 (for ε = 0 both sides of the inequality are equal to 0). However the asymptotics of L(ε) and R(ε) as ε → 0 are diﬀerent: it is not hard to check that L(ε) = Θ(ε), but R(ε) = O(ε2 ). From (3) we have Θ(ε) ≤ O(ε2 ), which is a contradiction.

(b) For every value of ε ∈ [0, 1] we consider the following joint distribution

**of binary variables (A, B, C, D):**

We substitute this distribution in (5) and obtain I0 + O(ε) ≤ I0 + 3ε log ε + O(ε) + O(κε), where I0 is the mutual information between C and D for ε = 0 (which is equal to the mutual information between A and B for ε = 0). We get a contradiction as ε → 0.

The theorem above implies that the set of all linear information inequalities for 4-tuples must have rather complicated structure. Let us remind that a point a ∈ R15 is called constructible is there exists a joint distribution (A, B, C, D) such that a = (H(A), H(B),..., H(A, B, C, D)) (a consists of entropies of all non-empty tuples of random variables A, B, C, D).

Further, a point a is called asymptotically constructible if for every ε 0 in ε-neighborhood of a there exists an constructible point a′. In a similar way the set of (asymptotically) constructible points is deﬁned for any number of n random variables (in R2 −1 for n-tuples of random variables). It is known (see, e.g., [5, 18]) that for every n the set of asymptotically constructible points representable by n-tuples of random variables make a closed convex cone in n R2 −1. The dual representation of this cone is the set of all linear information inequalities. We will show that for n ≥ 4 the structure of this cone is not trivial.

From Theorem 3 we get a new proof of the result by F. Mat´ˇ [9]: the set of us linear information inequalities for 4 random variables is not ﬁnitely generated.

Theorem 4 ([9]). The cone of asymptotically constructible points for 4 random variables is not polyhedral (equivalently, the set of linear information inequalities for 4-tuples of random variables is not ﬁnitely generated).

Proof. For the sake of contradiction we assume that the cone in R15 that consist of all asymptotically constructible points for 4 random variables (A, B, C, D) is polyhedral. The constraints I(A : B) = I(A : B|C) = 0 specify some face (of co-dimension 2) on the boundary of this polyhedron. The corresponding conditional inequality (from Theorem 1) speciﬁes a non-degenerate linear functionals, which is non-negative on the corresponding faces. Technically, this functional is deﬁned by the linear form g = I(C : D|A) + I(C : D|B) − I(C : D), which is non-negative on this face of the cone. With the standard linear programming technique it can be proven that this functional g can be extended to a linear functional g ′ such that (a) g ′ is non-negative on the entire cone of asymptotically constructible points, and (b) g ′ coincides with g on the subspace of co-dimension 2 deﬁned by the condition I(A : B) = I(A : B|C) = 0 (see Proposition 17 in [1]). In coordinates such a functional g ′ must have form

(with some reals d1 and d2 ). It follows that g ′ ≥ 0 for all constructible points, so we get (3) (where κ is equal to maximum of d1 and d2 ). This contradicts Theorem 3, and we are done. For the sake of being self-contained, we present a more detailed argument in Appendix.

4 Constraint inequality for Kolmogorov complexity Kolmogorov complexity of a ﬁnite binary string X is deﬁned as the length of the shortest program that generates X; similarly, Kolmogorov complexity of a string X given another string Y is deﬁned as the length of the shortest program that generates X given Y as an input. More formally, for any programming language L, Kolmogorov complexity KL (X|Y ) is deﬁned as

** KL (X|Y ) = min{|p| : program p prints X on input Y },**

and unconditional complexity KL (X) is deﬁned as complexity of X given the empty Y. The basic fact of Kolmogorov complexity theory is the invariance theorem: there exists a universal programming language U such that for any other language L we have KU (X|Y ) ≤ KL (X|Y ) + O(1) (the O(1) depends on L but not on X and Y ). We ﬁx such a universal language U ; in what follows we omit the subscript U and denote Kolmogorov complexity by K(X), K(X|Y ). We refer the reader to an excellent book [10] for a survey of properties of Kolmogorov complexity.

Kolmogorov complexity was introduced in [2] as an algorithmic version of measure of information in an individual object. In some sense, properties of Kolmogorov complexity are quite similar to properties Shannon’s entropy. For example, for the property of Shannon’s entropy H(A, B) = H(A) + H(B|A) there is a Kolmogorov’s counterpart

(the Kolmogorov–Levin theorem, [3]). This result justiﬁes the deﬁnition of the mutual information, which is an algorithmic version of the standard Shannon’s deﬁnition: the mutual information is deﬁned as I(A : B) := K(A) + K(B) − K(A, B), and the conditional mutual information is deﬁned as

** I(A : B|C) := K(A, C) + K(B, C) − K(A, B, C) − K(C).**

From the Kolmogorov–Levin theorem it follows that I(A : B) is equal to K(A) − K(A|B), and the conditional mutual information I(A : B|C) is equal to K(A|C) − K(A|B, C) (all these equations hold only up to logarithmic terms).

In fact, we have a much more deep and general parallel between Shannon’s and Kolmogorov’s information theories; for every linear inequality for Shannon’s

**entropy there exists a Kolmogorov’s counterpart:**

Theorem 5 ([11]). For each family of coeﬃcients {λW } the inequality

is true for all tuples of strings {ai }, N = K(a1, a2,...) (C does not depend on ai ).

Thus, the class of unconditional inequalities valid for Shannon’s entropy coincides with the class of (unconditional) inequalities valid for Kolmogorov complexity. What about conditional inequalities?

In the framework of Kolmogorov complexity we cannot say that some information quantity exactly equals zero. Indeed, even the deﬁnition of Kolmogorov complexity makes sense only up to an additive term that depends on the choice of the universal programming language. Moreover, such a natural basic statement as the Kolmogorov–Levin theorem (6) holds only up to a logarithmic term. So, if we want to prove a sensible conditional inequality for Kolmogorov complexity, the linear constraints must be formulated with some reasonable precision.

**A natural version of Theorem 1 is the following conjecture:**

Conjecture 2. There exist functions f (n) and g(n) such that f (n) = o(n) and

**g(n) = o(n), and for all strings A, B, C, D satisfying I(A : B|C) ≤ f (N ), I(A :**

B) ≤ f (N ) it holds I(C : D) ≤ I(C : D|A) + I(C : D|B) + g(N ) (where N = K(A, B, C, D)).

There is no hope to prove Conjecture 2 with f (n) and g(n) of order Θ(log n).

Indeed, using a counterexample from the proof of Theorem 3(a), we can construct binary strings A, B, C, D such that the quantities I(A : B|C), I(A : B), I(C : D|A), and I(C : D|B) are bounded by O(log N ), but I(C : D) = √ Ω( N log N ). However, even if Conjecture 2 is false in general, similar conditional inequalities (even with logarithmic precision) can be true for some special tuples A, B, C, D. In what follows we show how to prove such an inequality for one natural example of strings A, B, C (and any D).