Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ......

6

Click here to load reader

Transcript of Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ......

Page 1: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

Answers – Information Theory

1. The definition of mututal information is I (A;B) = log2P (A|B)P (A) which yields

P (A|B) = P (A) 2I(A;B)

(a) This gives us P (A|B) = P (A) , when I (A;B) = 0

(b) I (A;B) = +1 bit =⇒ P (A|B) = 2P (A) .

(c) I (A;B) = −1 bit =⇒ P (A|B) = P (A) /2

2. Ω = ω1, ω2, ω3, ω4, ω5, ω6 is the ”utfallsrum” for a die. P (ωi) = 1/6. We define three randomvariables:

ω X (ω) Y (ω) Z (ω)ω1 1 Odd B(elow)ω2 2 Even Bω3 3 Odd Bω4 4 Even A(bove)ω5 5 Odd Aω6 6 Even A

(a)Event, A Elementary events P (A) I (A;A)X = 2 ω2 1/6 log2 6 ≈ 2.58X = 2 ω3 1/6 log2 6 ≈ 2.58Y = Even ω2, ω4, ω6 1/2 1Z = B ω1, ω2, ω3 1/2 1

(b) The mutual information is given by I (A;B) = log2P (A|B)P (A) = log2

P (A,B)P (B)P (A)

i. I (X = 2; Z = B) = log2

(P (X=2,Z=U)

P (X=2)P (Z=U)

)= log2

(1/6

1/6∗1/2

)= log2 2 = 1

ii. I (X = 2; Y = Even) = log2

(P (X=2,Y =Even)

P (X=2)P (Y =Even)

)= log2

(1/6

1/6∗1/2

)= log2 2 = 1

iii. I (X = 3; Y = Even) = log2

(P (X=2,Z=Even)

P (X=2)P (Z=Even)

)= log2

(0

1/6∗1/2

)= −∞

iv. I (X = 2 ∪ X = 3 ;Y = Even) = log2

(1/6

1/3∗1/2

)= log2 1 = 0

v. I (X = 3; Y = Odd ∩ Z = B) = log2

(1/6

1/6∗1/3

)= log2 3 ≈ 1.58

(c) We can use the following to calculate the entropies

H (X) = E (− log2 (X)) = −L∑

i=1

pi log2 (pi)

H (XY ) = H (X) + H (Y |X) = H (X) +∑

i

H (Y |X = xi) P (X = xi)

i. H (X) = −6(

16 log2 (1/6)

)= log2 6 ≈ 2.58.

ii. H (Y ) = −2(

12 log2 (1/2)

)= log2 2 = 1.

iii. H (Z) = −2(

12 log2 (1/2)

)= log2 2 = 1.

iv. H (XY ) = H(

16 , 0, 0, 1

6 , 16 , 0, 0, 1

6 , 16 , 0, 0, 1

6

)= −6

(16 log2 (1/6)

)= log2 (6) ≈ 2.58.

Alternativly can we use H (XY ) = H (X)+H (Y |X) , and since if we know X we knowalso know Y which gives that H (Y |X) = 0.Thus H (XY ) = H (X) = log2 6

v. H (XZ), same as above H (XZ) = log2 6.vi. H (Y Z) = H

(16 , 1

3 , 13 , 1

6

)= 1

3 log2 6 + 23 log2 (3) = 1

3 + log2 3 ≈ 1.92Alternatively H (Y Z) = H (Y ) + H (Z|Y ) = ... = 1

3 + log2 (3) ≈ 1.92.

vii. H (XY Z) = H (X) + H (Y Z|X) = H (X) + 0 = log2 (6).

1

Page 2: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

(d) The mutual information can be calculated as I (X;Y ) = H (Y ) − H (Y |X) = H (X) +H (Y )−H (XY ).

i. I (X;Y ) = H (Y )−H(Y |X) = 1− 0 = 1ii. I (X;Z) = H (X) + H (Z)−H (XZ) = log2 6 + log2 2− log2 6 = 1iii. I (Y ;Z) = 1 + 1− ( 1

3 + log2 3) = 53 − log2 3 ≈ 0.082

iv. I (X;Y Z) = H (Y Z)−H(Y Z|X) = ( 13 + log2 3)− 0 = ( 1

3 + log2 3) ≈ 1.92.

3. H(

14 , 1

4 , 14 , 1

4

)= 4

(− 1

4 log2

(14

))= log2 4 = 2

H(

12 , 1

4 , 18 , 1

8

)= 1

2 log2 2 + 14 log2 4 + 1

4 log2 8 = 12 + 1

2 + 34 = 7

8

4. The vector variable (X1X2X3) can take the values (001), (011), (101), (110) with equal prob-ability 0.25. The marginal distributions are fX1 (0) = fX1 (1) = fX2 (0) = fX2 (1) = fX3 (0) =fX3 (1) = 1

2 .

(a) H (X1) = h(

12

)= 1

(b) H (X1X2) = H(

14 , 1

4 , 14 , 1

4

)= log2 4 = 2

(c) H (X2|X1) = H (X1X2)−H (X1) = 2− 1 = 1

(d) H (X1X2X3) = log2 4 = 1

(e) H (X3|X1X2) = H (X1X2X3)−H (X1X2) = 2− 2 = 0

(f) H (X3) = h (1/2) = 1

(g) I (X1;X3) = H (X1) + H (X3)−H (X1X3) = 1 + 1− 2 = 0

(h) I (X1X2;X3) = H (X1X2) + H (X3)−H (X1X2X3) = 2 + 1− 2 = 1

5. The probabilities for the joint random variable (X, Y ) is given by

p (xi,yj) y1 y2

x1 1/6 1/8x2 1/8 1/4x3 1/4 1/12

The marginal distribtuions are fX (xi) = (7/24, 3/8, 1/3) and fY (yi) = (13/24, 11/24).

• H (X) = H(

724 , 3

8 , 13

)= 7

24 log2247 + 3

8 log283 + 1

3 log2 3 ≈ 1.577

• H (Y ) = h (13/24) ≈ 0.995

• H (X, Y ) = 16 log2 6 + 1

4 log2 8 + 12 log2 4 + 1

12 log2 12 = 2512 + 3

12 log2 3 ≈ 2.48

• H (X|Y ) = H (XY )−H (Y ) ≈ 1.485

• H (Y |X) = H (XY )−H (X) ≈ 0.902

• I (X;Y ) = H (X) + H (Y )−H (XY ) ≈ 0.0928

6. The distribution of Y is fY (1) = 12 , fY (4) = 1

2 . med sannolikheterna fX (xi) = 1/4 for varje i.Definiera Y = X2 och los foljande uppgifter:

a) The covariance between X and Y is Kov (X, Y ) = E (XY ) − E (X) E (Y ) = E (XY ) =14 −

14 + 8

4 −84 = 0

b) I (X;Y ) = H (Y ) + H (Y |X) = 1 − 0 = 1. If we know X when also know Y , thusH (Y |X) = 0.

c) Independent variables implies uncorrelated, but not the other way in general. However, forjointly Gaussian variables uncorrelated implies independent.

2

Page 3: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

7. Let X be the number of heads, and Y denote if it is a false coint or not. The we have thefollowing joint distribution

P (Xi,Yj) Real False0 1/8 01 1/4 02 1/8 1/2

The mutual information is I (X;Y ) = H (X) + H (Y ) − H (XY ). H (X) = H(

18 , 1

4 , 58

)=

38 + 1

2 + 58 log2

85 = 22

8 − 58 log2 5 ≈ 1.299. H (Y ) = 1, H (XY ) = 3

8 + 24 + 3

8 + 12 = 1.75

I (X;Y ) = H (X) + H (Y )−H (XY ) ≈ 0.549

8. Examples...

(a) H (X|Y ) < H (X). Any variables where X and Y are dependent, e.g. (X, Y ) = (1, 1) , (1, 0) , (0, 0)with equal probability.

(b) H (X|Y ) = H (X). Any joint distribution where X and Y are independent, e.g., (X, Y ) =(1, 1) , (1, 0) , (0, 0) , (0, 1) with equal probability.

(c) H (X|Y ) > H (X) is impossible! Conditioning can never increase uncertainty!

9. Examples...

(a) I (X;Y |Z) < I (X;Y ) . E.g, Z = Y.

(b) I (X;Y |Z) = I (X;Y ). If Z is independent of X and Y .

(c) I (X;Y |Z) > I (X;Y ) . E.g, if X is the sum of two independent variables Y and Z.

10. The probability that it takes k throws is

P (k) =(

12

)k−1(12

)=(

12

)k

The entropy becomes

H (X) = − limN→∞

N∑k=1

(12

)k

log2

((12

)k)

= limN→∞

N∑k=1

k

(12

)k

= limN→∞

(2− (N + 2)

(12

)N)

= 2

Answers – Source coding

1. Consider the random variable U and the binary prefix free code for U

u fU (u) cu1 0.2 0u2 0.2 100u3 0.2 101u4 0.2 110u5 0.1 1110u6 0.1 1111

(a) Draw the code tree for this code.

(b) The entropy is H (U) = 0.8 log2 5 + 0.2 log2 10 ≈ 2.52. The average code length is l =0.2 ∗ 4 + 0.6 ∗ 3 + 0.2 = 2.8.

3

Page 4: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

(c) The code is optimal, ∑2−li = 1

If we construct a Huffman code it would have the same average code length.

2. A random variabel U has the following distribution

fU (ui) = 0.05, 0.1, 0.15, 0.2, 0.23, 0.27 .

(a) Construct a binary prefix free code for U such that the codeword for ui has length li =⌈log2

1fU (ui)

⌉fU (ui) li ci

0.27 2 110.23 3 1010.20 3 1000.15 3 0010.10 4 00010.05 5 00001

(b) H (U) ≈ 2.42. l = 2.93.

(c) The code is not optimal. The last to codewords does not have the same length. c5 = 0000is a trivial improvement of the code.

3. We can use Krafts lemma to test the existance of uniqely decodable

N∑i=1

2−li ≤ 1

1, 2, 3, 4, 4, 4 =⇒N∑

i=1

2−li = 1.0625 > 1

2, 2, 2, 3, 3, 4 =⇒N∑

i=1

2−li = 1.0625 > 1

1, 3, 3, 3, 4, 4 =⇒N∑

i=1

2−li = 1

It is only the last set of lengths that can correspond a prefix free code. It is not possible toimprove it without know more about the symbol probabilites.

4. The probabiltes of the binary source X are(

14 , 3

4

). The entropy is H (X) = h (1/4) ≈ 0.811. If

we group the symbols we have the probabilites(

116 , 3

16 , 316 , 9

16

).The lengths li of the Huffman

code are (3, 3, 2, 1) , yielding l = 0.844 bits/symbol. If we group symbols in triplets the codelengths will be (5, 5, 5, 5, 3, 3, 3, 1) and the average length will be l = 0.823 bits/symbol.

5. Assume that X = (X1X2 . . . Xk) where Xi are equally independent distributed random variables.

(a) H (X) = H (X1X2 . . . Xk) =∑

H (Xi) = kH (X)..

(b) When know that for an optimal code the following holds

H (X) ≤ l ≤ H (X) + 1

Using that H (X) = kH (X) we get

H (X) ≤ l

k≤ H (X) +

1k

4

Page 5: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

6. Memory free source U with probabilities 3/4, 3/16, 1/16. The entropy is H (U) = 34 log2

43 +

316 log2

163 + 1

16 log2 16 = 52 −

1516 log2 3 ≈ 1.014. From last exercise we know that if would be

sufficient to group 20 symbols to get within 5% of the optimal. It tursn out that groupingsymbols two and two is enough.

(U1, U2) fU (ui) ci

(u1, u1) 9/16 1(u1, u2) 9/64 010(u1, u3) 3/64 0010(u2, u1) 9/64 011(u2, u2) 9/256 0001(u2, u3) 3/256 000011(u3, u1) 3/64 0011(u3, u2) 3/256 00000(u3, u3) 1/256 000010

The average code length per symbols is l = 1.0371

7. Conisider the following Markov source

Un = f (Sn) , n = 0, 1, 2, . . . ,

where Sn, n = 0, 1, 2, . . . is a stationary Markov chain with states S = s1, s2 and the outputsignal function f (·) is defined as

f (x) =

0 for x = s1

1 for x = s2.

The Markov chain Sn is depicted below, where p = 0.05 och q = 0.01.

(a) The transition probability matrix is

Π =(

0.95 0.050.01 0.99

)The stationary distribution is found by solving the eigenvalue problem

ωλ = ω

(1− p p

q 1− q

)The eigenvector that corresponds to eigen value 1 is ω =

( qp+q

pp+q

). With p = 0.05

and q = 0.01 we get ω = (1/6, 5/6).

(b) Since the source is unifilar we get that H∞ (S) = H∞ (U).

H∞ (S) = H (S1|S0) =2∑

i=1

ωiH (S1|S0 = si) = ω1h (p) + ω2h (q)

=q

p + qh (p) +

p

p + qh (q) =

16h (0.05) +

56h (0.01) ≈ 0.115

The probabilities p = q = 0.5 maximizes the entropy rate to H∞ (U) = 1.

(c) Determine I (Sn+k;Sn), i.e. the mutual information between Sn+k and Sn.The mutual information is given by

I (Sn+k;Sn) = H (Sn+k)−H (Sn+k|Sn)

where H (Sn+k) = h (ω1) = h (ω2). The second entropy is the difficult one

H (Sn+k|Sn) =2∑

i=1

P (Sn = si)H (Sn+k|Sn = si)

5

Page 6: Answers – Information Theory fileAnswers – Information Theory 1. ... 4 4 Even A(bove) ... (X=2,Z=U) P(X=2)P(Z=U) = log 2 1/6 1/6∗1/2 = log 2 2 = 1 ii.

To calculate this we must have the transition probabilities

P (Sn+k = sj |Sn = si) =(Πk)i,j

that is we need to have the elements in the k:th power of the transition matrix Π. We canuse the eigenvalue decomposition to do this

Π = A−1ΛA

where A contains the eigenvecors (row vectors) to Π and Λ is a diagonal matrix withcorresponding eigenvalues. The eigenvectors are

A =(−a1−−a2−

)=(

q p1 −1

)and the corresponding eigenvalues are λ1 = 1 and λ2 = (1− p− q). The inverse becomes

A−1 =(

1 p1 −q

)1

q + p

Now we can calculate Πk as

Πk =(A−1ΛA

)k= A−1ΠkA

=(

1 p1 −q

)(1 00 1− p− q

)k (q p1 −1

)1

q + p

=(

1 p1 −q

)(1 00 (1− p− q)k

)(q p1 −1

)1

q + p

=

(1 p (1− p− q)k

1 −q (1− p− q)k

)(q p1 −1

)1

q + p

=

(q + p (1− p− q)k

p− p (1− p− q)k

q − q (1− p− q)kp + q (1− p− q)k

)1

q + p

Using p = 0.05, q = 0.01 gives

Πk =

(1 + 5 (0.94)k 1− 5 (0.94)k

1− (0.94)k 5 + (0.94)k

)16

The conditional entropy is

H (Sn+k|Sn) =16h

(1 + 5 (0.94)k

6

)+

56h

(1− 0.94k

6

)Finally we the the mutual informations as

I (Sn+k|Sn) = h

(16

)− 1

6h

(1 + 5 (0.94)k

6

)− 5

6h

(1− 0.94k

6

)(d) Determine the autocorrelation function rU [k] = E (Un+kUn). We have four possible com-

binations of the pair (Un+k, Un) , (0, 0) (0, 1) (1, 0) and (1, 1). Only the last one contributeto the expected value

rU [k] = E (Un+kUn) = P (Un+k = 1, Un = 1) ∗ 1 = P (Un = 1) (Un+k = 1|Un = 1)

= ω2

(Πk)1,1

=56

5 + (0.94)k

6=

5(5 + (0.94)k

)36

6