On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein...

61
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich, Switzerland August 20, 2012. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 1 / 32

Transcript of On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein...

Page 1: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

On the Entropy of Sums of Bernoulli Random Variablesvia the Chen-Stein Method

Igal Sason

Department of Electrical EngineeringTechnion - Israel Institute of Technology

Haifa 32000, Israel

ETH, Zurich, SwitzerlandAugust 20, 2012.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 1 / 32

Page 2: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Problem Statement

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 2 / 32

Page 3: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Problem Statement

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 2 / 32

Page 4: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Problem Statement

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 2 / 32

Page 5: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Problem Statement

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

Problem: Derive tight bounds for the entropy of W .

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 2 / 32

Page 6: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Problem Statement

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞).

Let Po(λ) denote the Poisson distribution with parameter λ.

Problem: Derive tight bounds for the entropy of W .In this talk, error bounds on the entropy difference H(W ) − H(Po(λ)) areintroduced, providing rigorous bounds for the Poisson approximation of theentropy H(W ). These bounds are exemplified.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 2 / 32

Page 7: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Poisson Approximation

Binomial Approximation to the Poisson

If X1,X2, . . . ,Xn are i.i.d. Bernoulli random variables with parameter λn ,

then for large n, the distribution of their sum is close to a Poissondistribution with parameter λ: Sn ,

∑ni=1 Xi ≈ Po(λ).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 3 / 32

Page 8: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Poisson Approximation

Binomial Approximation to the Poisson

If X1,X2, . . . ,Xn are i.i.d. Bernoulli random variables with parameter λn ,

then for large n, the distribution of their sum is close to a Poissondistribution with parameter λ: Sn ,

∑ni=1 Xi ≈ Po(λ).

General Poisson Approximation (Law of Small Numbers)

For random variables {Xi}ni=1 on N0 , {0, 1, . . .}, the sum

∑ni=1 Xi is

approximately Poisson distributed with mean λ =∑n

i=1 pi as long as

P(Xi = 0) is close to 1,

P(Xi = 1) is uniformly small,

P(Xi > 1) is negligible as compared to P(Xi = 1),

{Xi}ni=1 are weakly dependent.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 3 / 32

Page 9: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Information-Theoretic Results for Poisson Approximation

Maximum Entropy Result for Poisson Approximation

The Po(λ) distribution has maximum entropy among all probabilitydistributions that can be obtained as sums of independent Bernoulli RVs:

H(Po(λ)) = supS∈B∞(λ)

H(S)

B∞(λ) ,⋃

n∈N

Bn(λ)

Bn(λ) ,

{

S : S =

n∑

i=1

Xi, Xi ∼ Bern(pi) independent,

n∑

i=1

pi = λ

}

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 4 / 32

Page 10: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Information-Theoretic Results for Poisson Approximation

Maximum Entropy Result for Poisson Approximation

The Po(λ) distribution has maximum entropy among all probabilitydistributions that can be obtained as sums of independent Bernoulli RVs:

H(Po(λ)) = supS∈B∞(λ)

H(S)

B∞(λ) ,⋃

n∈N

Bn(λ)

Bn(λ) ,

{

S : S =

n∑

i=1

Xi, Xi ∼ Bern(pi) independent,

n∑

i=1

pi = λ

}

.

Due to a monotonicity property, then

H(Po(λ)) = limn→∞

supS∈Bn(λ)

H(S).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 4 / 32

Page 11: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Maximum Entropy Result for Poisson Approximation (Cont.)

For n ∈ N, the maximum entropy distribution in the class Bn(λ) is

Binomial(

n, λn

)

, so

H(Po(λ)) = limn→∞

H(

Binomial(

n,λ

n

))

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 5 / 32

Page 12: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Maximum Entropy Result for Poisson Approximation (Cont.)

For n ∈ N, the maximum entropy distribution in the class Bn(λ) is

Binomial(

n, λn

)

, so

H(Po(λ)) = limn→∞

H(

Binomial(

n,λ

n

))

.

Proofs rely on convexity arguments a la Mateev (1978), Shepp & Olkin(1978), Karlin & Rinott (1981), Harremoes (2001).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 5 / 32

Page 13: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Maximum Entropy Result for Poisson Approximation (Cont.)

For n ∈ N, the maximum entropy distribution in the class Bn(λ) is

Binomial(

n, λn

)

, so

H(Po(λ)) = limn→∞

H(

Binomial(

n,λ

n

))

.

Proofs rely on convexity arguments a la Mateev (1978), Shepp & Olkin(1978), Karlin & Rinott (1981), Harremoes (2001).

Recent generalizations and extensions by Johnson et al. (2007–12):

Extension of this maximum entropy result to the larger set ofultra-log-concave probability mass functions.

Generalization to maximum entropy results for discrete compoundPoisson distributions.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 5 / 32

Page 14: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Information-Theoretic Ideas in Poisson Approximation

Nice surveys on the information-theoretic approach for Poissonapproximation are available at:

1 I. Kontoyiannis, P. Harremoes, O. Johnson and M. Madiman,“Information-theoretic ideas in Poisson approximation andconcentration,” slides of a short course (the slides are available at thehomepage of I. Kontoyiannis), September 2006.

2 O. Johnson, Information Theory and the Central Limit Theorem,Imperial College Press, 2004.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 6 / 32

Page 15: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Total Variation Distance

Let P and Q be two probability measures defined on a set X . Then, thetotal variation distance between P and Q is defined by

dTV(P,Q) , supBorel A⊆X

|P (A) − Q(A)|

where the supermum is taken w.r.t. all the Borel subsets A of X .

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 7 / 32

Page 16: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Total Variation Distance

Let P and Q be two probability measures defined on a set X . Then, thetotal variation distance between P and Q is defined by

dTV(P,Q) , supBorel A⊆X

|P (A) − Q(A)|

where the supermum is taken w.r.t. all the Borel subsets A of X .If X is a countable set then this definition is simplified to

dTV(P,Q) =1

2

x∈X

|P (x) − Q(x)| =||P − Q||1

2

so the total variation distance is equal to one-half of the L1-distancebetween the two probability distributions.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 7 / 32

Page 17: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Total Variation Distance

Let P and Q be two probability measures defined on a set X . Then, thetotal variation distance between P and Q is defined by

dTV(P,Q) , supBorel A⊆X

|P (A) − Q(A)|

where the supermum is taken w.r.t. all the Borel subsets A of X .If X is a countable set then this definition is simplified to

dTV(P,Q) =1

2

x∈X

|P (x) − Q(x)| =||P − Q||1

2

so the total variation distance is equal to one-half of the L1-distancebetween the two probability distributions.

Question: How to get bounds on the total variation distance and theentropy difference for the Poisson approximation ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 7 / 32

Page 18: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Chen-Stein Method

The Chen-Stein method forms a powerful probabilistic tool to calculateerror bounds for the Poisson approximation (Chen 1975).

This method is based on the simple property of the Poisson distributionwhere Z ∼ Po(λ) with λ ∈ (0,∞) if and only if

λ E[f(Z + 1)] − E[Z f(Z)] = 0

for all bounded functions f that are defined on N0 , {0, 1, . . .}.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 8 / 32

Page 19: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Chen-Stein Method

The Chen-Stein method forms a powerful probabilistic tool to calculateerror bounds for the Poisson approximation (Chen 1975).

This method is based on the simple property of the Poisson distributionwhere Z ∼ Po(λ) with λ ∈ (0,∞) if and only if

λ E[f(Z + 1)] − E[Z f(Z)] = 0

for all bounded functions f that are defined on N0 , {0, 1, . . .}.This method provides a rigorous analytical treatment, via error bounds, tothe case where W has approximately a Poisson distribution Po(λ). Theidea behind this method is to treat analytically (by bounds)

λ E[f(W + 1)] − E[W f(W )]

which is shown to be close to zero for a function f as above.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 8 / 32

Page 20: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

The following theorem relies on the Chen-Stein method:

Bounds on the Total Variation Distance for Poisson Approximation,[Barbour & Hall, 1984]

Let W =∑n

i=1 Xi be a sum of n independent Bernoulli random variableswith E(Xi) = pi for i ∈ {1, . . . , n}, and E(W ) = λ. Then, the totalvariation distance between the probability distribution of W and thePoisson distribution with mean λ satisfies

1

32

(

1 ∧ 1

λ

)

n∑

i=1

p2i ≤ dTV(PW ,Po(λ)) ≤

(

1 − e−λ

λ

) n∑

i=1

p2i

where a ∧ b , min{a, b} for every a, b ∈ R.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 9 / 32

Page 21: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization of the Upper Bound on the Total Variation Distancefor a Sum of Dependent Bernoulli Random Variables

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞). Note that W is a sum of (possiblydependent and non-identically distributed) Bernoulli RVs {Xα}α∈I .

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 10 / 32

Page 22: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization of the Upper Bound on the Total Variation Distancefor a Sum of Dependent Bernoulli Random Variables

Let I be a countable index set, and for α ∈ I, let Xα be a Bernoullirandom variable with

pα , P(Xα = 1) = 1 − P(Xα = 0) > 0.

LetW ,

α∈I

Xα, λ , E(W ) =∑

α∈I

where it is assumed that λ ∈ (0,∞). Note that W is a sum of (possiblydependent and non-identically distributed) Bernoulli RVs {Xα}α∈I .

For every α ∈ I, let Bα be a subset of I that is chosen such that α ∈ Bα.This subset is interpreted [Arratia et al., 1989] as the neighborhood ofdependence for α where Xα is independent or weakly dependent of all ofthe Xβ for β /∈ Bα.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 10 / 32

Page 23: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization (Cont.)

Furthermore, the following coefficients were defined by [Arratia et al.,1989]:

b1 ,∑

α∈I

β∈Bα

pαpβ

b2 ,∑

α∈I

α6=β∈Bα

pα,β, pα,β , E(XαXβ)

b3 ,∑

α∈I

sα, sα , E∣

∣E(Xα − pα |σ({Xβ})β∈I\Bα)∣

where σ(·) in the conditioning of last equality denotes the σ-algebra thatis generated by the random variables inside the parenthesis.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 11 / 32

Page 24: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization (Cont.)

Then, the following upper bound on the total variation distance holds:

dTV(PW ,Po(λ)) ≤ (b1 + b2)

(

1 − e−λ

λ

)

+ b3

(

1 ∧ 1.4√λ

)

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 12 / 32

Page 25: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization (Cont.)

Then, the following upper bound on the total variation distance holds:

dTV(PW ,Po(λ)) ≤ (b1 + b2)

(

1 − e−λ

λ

)

+ b3

(

1 ∧ 1.4√λ

)

.

Proof Methodology: The Chen-Stein method [Arratia et al., 1989].

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 12 / 32

Page 26: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Poisson Approximation

Generalization (Cont.)

Then, the following upper bound on the total variation distance holds:

dTV(PW ,Po(λ)) ≤ (b1 + b2)

(

1 − e−λ

λ

)

+ b3

(

1 ∧ 1.4√λ

)

.

Proof Methodology: The Chen-Stein method [Arratia et al., 1989].

Special Case

If {Xα}α∈I are independent and α , {1, . . . , n}, then b1 =∑n

i=1 p2i and

b2 = b3 = 0, which gives the previous upper bound on the total variationdistance.⇒ pi = λ

n for all i ∈ {1, . . . , n} (i.e., a sum of n i.i.d. Bernoulli random

variables with parameter λn) gives that

dTV(PW ,Po(λ)) ≤ λ2

n↘ 0.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 12 / 32

Page 27: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Total Variation Distance vs. Entropy Difference

An upper bound on the total variation distance between W and thePoisson random variable Z ∼ Po(λ) was introduced. This bound wasderived by Arratia et al. (1989) via the Chen-Stein method.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 13 / 32

Page 28: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Total Variation Distance vs. Entropy Difference

An upper bound on the total variation distance between W and thePoisson random variable Z ∼ Po(λ) was introduced. This bound wasderived by Arratia et al. (1989) via the Chen-Stein method.

Questions1 Does a small total variation distance between two probability

distributions ensure that their entropies are close ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 13 / 32

Page 29: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Total Variation Distance vs. Entropy Difference

An upper bound on the total variation distance between W and thePoisson random variable Z ∼ Po(λ) was introduced. This bound wasderived by Arratia et al. (1989) via the Chen-Stein method.

Questions1 Does a small total variation distance between two probability

distributions ensure that their entropies are close ?

2 Can one get, under some conditions, a bound on the differencebetween the entropies in terms of the total variation distance ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 13 / 32

Page 30: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Total Variation Distance vs. Entropy Difference

An upper bound on the total variation distance between W and thePoisson random variable Z ∼ Po(λ) was introduced. This bound wasderived by Arratia et al. (1989) via the Chen-Stein method.

Questions1 Does a small total variation distance between two probability

distributions ensure that their entropies are close ?

2 Can one get, under some conditions, a bound on the differencebetween the entropies in terms of the total variation distance ?

3 Can one get a bound on the difference between the entropy H(W )and the entropy of the Poisson RV Z ∼ Po(λ) (with λ =

pi) interms of the bound on the total variation distance ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 13 / 32

Page 31: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Total Variation Distance vs. Entropy Difference

An upper bound on the total variation distance between W and thePoisson random variable Z ∼ Po(λ) was introduced. This bound wasderived by Arratia et al. (1989) via the Chen-Stein method.

Questions1 Does a small total variation distance between two probability

distributions ensure that their entropies are also close ?

2 Can one get, under some conditions, a bound on the differencebetween the entropies in terms of the total variation distance ?

3 Can one get a bound on the difference between the entropy H(W )and the entropy of the Poisson RV Z ∼ Po(λ) (with λ =

pi) interms of the bound on the total variation distance ?

4 How the entropy of the Poisson RV can be calculated efficiently (bydefinition, it becomes an infinite series) ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 13 / 32

Page 32: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 1

Does a small total variation distance between two probability distributionsensure that their entropies are also close ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 14 / 32

Page 33: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 1

Does a small total variation distance between two probability distributionsensure that their entropies are also close ?

Answer 1

Answer: In general, NO. The total variation distance between twoprobability distributions may be arbitrarily small whereas the differencebetween the two entropies is arbitrarily large.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 14 / 32

Page 34: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 1

Does a small total variation distance between two probability distributionsensure that their entropies are also close ?

Answer 1

Answer: In general, NO. The total variation distance between twoprobability distributions may be arbitrarily small whereas the differencebetween the two entropies is arbitrarily large.

A Possible Example [Ho & Yeung, ISIT 2007]

For a fixed L ∈ N, let P = (p1, p2, . . . , pL) be a probability mass function.For M ≥ L, let

Q =

(

p1 −p1√

log M, p2 +

p1

M√

log M, . . . , pL +

p1

M√

log M,

p1

M√

log M, . . . ,

p1

M√

log M

)

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 14 / 32

Page 35: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Example (cont.)

Then

dTV(P,Q) =2 p1√log M

↘ 0

when M → ∞. On the other hand, for a sufficiently large M ,

H(Q) − H(P ) ≈ p1

(

1 − L

M

)

log M ↗ ∞.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 15 / 32

Page 36: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Example (cont.)

Then

dTV(P,Q) =2 p1√log M

↘ 0

when M → ∞. On the other hand, for a sufficiently large M ,

H(Q) − H(P ) ≈ p1

(

1 − L

M

)

log M ↗ ∞.

Conclusion

It is easy to construct two random variables X and Y whose supports arefinite although one of their supports is not bounded such that the totalvariation distance between their distributions is arbitrarily close to zerowhereas the difference between their entropies is arbitrarily large.⇒ If the alphabet size of X or Y is not bounded, then

dTV(PX , PY ) → 0 does NOT imply that |H(PX) − H(PY )| → 0.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 15 / 32

Page 37: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 2

Can one get, under some conditions, a bound on the difference betweenthe entropies in terms of the total variation distance ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 16 / 32

Page 38: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 2

Can one get, under some conditions, a bound on the difference betweenthe entropies in terms of the total variation distance ?

Answer 2

Yes, it is true when the alphabet sizes of X and Y are finite and fixed.The following theorem is an L1 bound on the entropy (see [Cover andThomas, Theorem 17.3.3] or [Csiszar and Korner, Lemma 2.7]):

Let P and Q be two probability mass functions on a finite set X such that

||P − Q||1 ,∑

x∈X

|P (x) − Q(x)| ≤ 1

2.

Then, the difference between their entropies to the base e satisfies

|H(P ) − H(Q)| ≤ ||P − Q||1 log

( |X |||P − Q||1

)

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 16 / 32

Page 39: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Answer 2 (cont.)

This inequality can be written in the form

|H(P ) − H(Q)| ≤ 2dTV(P,Q) log

( |X |2dTV(P,Q)

)

when dTV(P,Q) ≤ 14 , thus bounding the difference of the entropies in

terms of the total variation distance when the alphabet sizes are finite andfixed.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 17 / 32

Page 40: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Answer 2 (cont.)

This inequality can be written in the form

|H(P ) − H(Q)| ≤ 2dTV(P,Q) log

( |X |2dTV(P,Q)

)

when dTV(P,Q) ≤ 14 , thus bounding the difference of the entropies in

terms of the total variation distance when the alphabet sizes are finite andfixed.

Further Improvements of the Bound on the Entropy Difference inTerms of the Total Variation Distance

This inequality was further improved by S. Wai Ho & R. Yeung (seeTheorem 6 and its refinement in Theorem 7 in their paper entitled “Theinterplay between entropy and variation distance,” IEEE Trans. onInformation Theory, Dec. 2010.)

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 17 / 32

Page 41: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 3

Can one get a bound on the difference between the entropy H(W ) andthe entropy of the Poisson RV Z ∼ Po(λ) (with λ =

pi) in terms of thebound on the total variation distance ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 18 / 32

Page 42: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 3

Can one get a bound on the difference between the entropy H(W ) andthe entropy of the Poisson RV Z ∼ Po(λ) (with λ =

pi) in terms of thebound on the total variation distance ?

Answer 3

From the reply to the first question, since the Poisson distribution isdefined over an infinitely countable set, it is not clear that the difference

H(W ) − H(Po(λ))

can be bounded in terms of the total variation distance dTV(W,Po(λ)).But, we provide an affirmative answer and derive such a bound.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 18 / 32

Page 43: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

New Result (Answer 3 (cont.))

Let I be an arbitrary finite index set with m , |I|. Under the setting ofthis work and the notation used in slides 9–10, let

a(λ) , 2

[

(b1 + b2)

(

1 − e−λ

λ

)

+ b3

(

1 ∧ 1.4√λ

)

]

b(λ) ,

[

(

λ log( e

λ

))

+

+ λ2 +6 log(2π) + 1

12

]

exp

{

−λ − (m − 1) log

(

m − 1

λe

)}

where, in the last equality, (x)+ , max{x, 0} for every x ∈ R. LetZ ∼ Po(λ) be a Poisson random variable with mean λ. If a(λ) ≤ 1

2 and

λ ,∑

α∈I pα ≤ m − 1, then the difference between the entropies (to thebase e) of Z and W satisfies the following inequality:

|H(Z) − H(W )| ≤ a(λ) log

(

m + 2

a(λ)

)

+ b(λ).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 19 / 32

Page 44: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

A Tighter Bound for Independent Summands

If the summands {Xα}α∈I are also independent, then

0 ≤ H(Z) − H(W ) ≤ g(p) log

(

m + 2

g(p)

)

+ b(λ)

if g(p) ≤ 12 and λ ≤ m − 1, where

g(p) , 2θ min

{

1 − e−λ,3

4e(1 −√

θ)3/2

}

p ,{

}

α∈I, λ ,

α∈I

θ ,1

λ

α∈I

p2α.

The proof relies on the upper bounds on the total variation distance byBarbour and Hall (1984) and by Cekanavicius and Roos (2006), themaximum entropy result, and the derivation of the general bound.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 20 / 32

Page 45: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 4

How the entropy of the Poisson RV can be calculated efficiently ?

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 21 / 32

Page 46: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Question 4

How the entropy of the Poisson RV can be calculated efficiently ?

Answer 4

The entropy of a random variable Z ∼ Po(λ) is equal to

H(Z) = λ log( e

λ

)

+∞

k=1

λke−λ log k!

k!

so the entropy of the Poisson distribution (in nats) is expressed in terms ofan infinite series that has no closed form.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 21 / 32

Page 47: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Interplay Between Total Variation Distance and Entropy Difference

Answer 4 (Cont.)

Sequences of simple upper and lower bounds on this entropy, which areasymptotically tight, were derived by Adell et al. [IEEE Trans. on IT, May2010]. In particular, from Theorem 2 in this paper

− 31

24λ2− 33

20λ3− 1

20λ4≤ H(Z) − 1

2log(2πeλ) +

1

12λ≤ 5

24λ2+

1

60λ3

which gives tight bounds on the entropy of Z ∼ Po(λ) for λ � 1.

For λ ≥ 20, the entropy of Z is approximated by the average of itsabove upper and lower bounds, asserting that the relative error of thisapproximation is less than 0.1% (and it scales like 1

λ2 ).

For λ ∈ (0, 20), a truncation of the above infinite series after its firstd10λe terms gives an accurate approximation.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 22 / 32

Page 48: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs

The setting of this problem was introduced by Arratia et al. (1989) in thecontext of applications of the Chen-Stein method.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 23 / 32

Page 49: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs

The setting of this problem was introduced by Arratia et al. (1989) in thecontext of applications of the Chen-Stein method.

Problem Setting

On the cube {0, 1}n, assume that each of the n2n−1 edges is assigneda random direction by tossing a fair coin.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 23 / 32

Page 50: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs

The setting of this problem was introduced by Arratia et al. (1989) in thecontext of applications of the Chen-Stein method.

Problem Setting

On the cube {0, 1}n, assume that each of the n2n−1 edges is assigneda random direction by tossing a fair coin.

Let k ∈ {0, 1, . . . , n} be fixed, and denote by W , W (k, n) therandom variable that is equal to the number of vertices at whichexactly k edges point outward (so k = 0 corresponds to the eventwhere all n edges, from a certain vertex, point inward).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 23 / 32

Page 51: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs

The setting of this problem was introduced by Arratia et al. (1989) in thecontext of applications of the Chen-Stein method.

Problem Setting

On the cube {0, 1}n, assume that each of the n2n−1 edges is assigneda random direction by tossing a fair coin.

Let k ∈ {0, 1, . . . , n} be fixed, and denote by W , W (k, n) therandom variable that is equal to the number of vertices at whichexactly k edges point outward (so k = 0 corresponds to the eventwhere all n edges, from a certain vertex, point inward).

Let I be the set of all 2n vertices, and Xα be the indicator thatvertex α ∈ I has exactly k of its edges directed outward. ThenW =

α∈I Xα with Xα ∼ Bern(p), p = 2−n(nk

)

, ∀α ∈ I.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 23 / 32

Page 52: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs

The setting of this problem was introduced by Arratia et al. (1989) in thecontext of applications of the Chen-Stein method.

Problem Setting

On the cube {0, 1}n, assume that each of the n2n−1 edges is assigneda random direction by tossing a fair coin.

Let k ∈ {0, 1, . . . , n} be fixed, and denote by W , W (k, n) therandom variable that is equal to the number of vertices at whichexactly k edges point outward (so k = 0 corresponds to the eventwhere all n edges, from a certain vertex, point inward).

Let I be the set of all 2n vertices, and Xα be the indicator thatvertex α ∈ I has exactly k of its edges directed outward. ThenW =

α∈I Xα with Xα ∼ Bern(p), p = 2−n(nk

)

, ∀α ∈ I.

Problem: Estimate the entropy H(W ).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 23 / 32

Page 53: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs (Cont.)

The problem setting implies that λ =(nk

)

(since |I| = 2n).

The neighborhood of dependence of a vertex α ∈ I, denoted by Bα,is the set of vertices that are directly connected to α (including α

itself since it is required that α ∈ Bα). Hence, b1 = 2−n(n + 1)(

nk

)2.

If α and β are two vertices that are connected by an edge, then aconditioning on the direction of this edge gives that

pα,β , E(XαXβ) = 22−2n

(

n − 1

k

)(

n − 1

k − 1

)

for every α ∈ I and β ∈ Bα \ {α}, so by definition (see slide 10),

b2 = n 22−n

(

n − 1

k

)(

n − 1

k − 1

)

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 24 / 32

Page 54: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Example: Random Graphs (Cont.)

b3 = 0 (since the conditional expectation of Xα given (Xβ)β∈I\Bαis,

similarly to the un-conditional expectation, equal to pα). Thedirections of the edges outside the neighborhood of dependence of αare irrelevant to the directions of the edges connecting the vertex α.

In the following, the bound on the entropy difference is applied to geta rigorous error bound on the Poisson approximation of the entropyH(W ).

By symmetry, the cases with W (k, n) and W (n − k, n) areequivalent, so

H(

W (k, n))

= H(

W (n − k, n))

.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 25 / 32

Page 55: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Applications

Numerical Results for the Example of Random Graphs

Table: Numerical results for the Poisson approximations of the entropy H(W )(W = W (k, n)) by the entropy H(Z) where Z ∼ Po(λ), jointly with theassociated error bounds of these approximations. These error bounds arecalculated from the new theorem (see slide 18).

n k λ =(nk

)

H(W ) ≈ Maximal relative error

30 26 2.741 · 104 6.528 nats 0.94%

30 25 1.425 · 105 7.353 nats 4.33%

100 95 7.529 · 107 10.487 nats 1.6 · 10−19

100 85 2.533 · 1017 21.456 nats 2.6 · 10−10

100 75 2.425 · 1023 28.342 nats 1.9 · 10−4

100 70 2.937 · 1025 30.740 nats 2.1%

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 26 / 32

Page 56: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Generalization of the Error Bounds on the Entropy

Generalization: Bounds on the Entropy for a Sum of Non-Negative,Integer-Valued and Bounded Random Variables

A generalization of the earlier bound was derived in the full-paperversion, considering the accuracy of the Poisson approximation for theentropy of a sum of non-negative, integer-valued and boundedrandom variables.

This generalization is enabled via the combination of the proof of theprevious bound, considering the entropy of the sum of Bernoullirandom variables, with the approach of Serfling (Section 7 in hispaper from 1978).

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 27 / 32

Page 57: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Bibliography

Full Paper Version

This talk presents in part the first half of the paper:I. Sason, “An information-theoretic perspective of the Poissonapproximation via the Chen-Stein method,” submitted to the IEEETrans. on Information Theory, June 2012. [Online]. Available:http://arxiv.org/abs/1206.6811.

A generalization of the bounds that considers the accuracy of thePoisson approximation for the entropy of a sum of non-negative,integer-valued and bounded random variables is introduced in the fullpaper.

The second part of this paper derives lower bounds on the totalvariation distance, relative entropy and other measures that are notcovered in this talk.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 28 / 32

Page 58: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Bibliography

Bibliography

J. A. Adell, A. Lekouna and Y. Yu, “Sharp bounds on the entropy of thePoisson law and related quantities,” IEEE Trans. on Information Theory,vol. 56, no. 5, pp. 2299–2306, May 2010.

R. Arratia, L. Goldstein and L. Gordon, “Poisson approximation and theChen-Stein method,” Statistical Science, vol. 5, no. 4, pp. 403–424,November 1990.

A. D. Barbour and P. Hall, “On the rate of Poisson Convergence,”Mathematical Proceedings of the Cambridge Philosophical Society, vol. 95,no. 3, pp. 473–480, 1984.

A. D. Barbour, L. Holst and S. Janson, Poisson Approximation, OxfordUniversity Press, 1992.

A. D. Barbour and L. H. Y. Chen, An Introduction to Stein’s Method,Lecture Notes Series, Institute for Mathematical Sciences, SingaporeUniversity Press and World Scientific, 2005.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 29 / 32

Page 59: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Bibliography

Bibliography (Cont.)

V. Cekanavicius and B. Roos, “An expansion in the exponent for compoundbinomial approximations,” Lithuanian Mathematical Journal, vol. 46, no. 1,pp. 54–91, 2006.

L. H. Y. Chen, “Poisson approximation for dependent trials,” Annals of

Probability, vol. 3, no. 3, pp. 534–545, June 1975.

T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wileyand Sons, second edition, 2006.

I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete

Memoryless Systems, Academic Press, New York, 1981.

P. Harremoes, “Binomial and Poisson distributions as maximum entropydistributions,” IEEE Trans. on Information Theory, vol. 47, no. 5,pp. 2039–2041, July 2001.

P. Harremoes, O. Johnson and I. Kontoyiannis, “Thinning, entropy and thelaw of thin numbers,” IEEE Trans. on Information Theory, vol. 56, no. 9,pp. 4228–4244, September 2010.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 30 / 32

Page 60: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Bibliography

Bibliography (Cont.)

S. W. Ho and R. W. Yeung, “The interplay between entropy and variationaldistance,” IEEE Trans. on Information Theory, vol. 56, no. 12,pp. 5906–5929, December 2010.

O. Johnson, Information Theory and the Central Limit Theorem, ImperialCollege Press, 2004.

O. Johnson, “Log-concavity and maximum entropy property of the Poissondistribution,” Stochastic Processes and their Applications, vol. 117, no. 6,pp. 791–802, November 2006.

O. Johnson, I. Kontoyiannis and M. Madiman, “A criterion for thecompound Poisson distribution to be maximum entropy,” Proceedings 2009

IEEE International Symposium on Information Theory, pp. 1899–1903,Seoul, South Korea, July 2009.

S. Karlin and Y. Rinott, “Entropy inequalities for classes of probabilitydistributions I: the univariate case,” Advances in Applied Probability,vol. 13, no. 1, pp. 93–112, March 1981.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 31 / 32

Page 61: On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein …webee.technion.ac.il/people/sason/ETH_seminar1_August... · 2012. 8. 21. · On the Entropy of Sums of Bernoulli

Bibliography

Bibliography (Cont.)

I. Kontoyiannis, P. Harremoes and O. Johnson, “Entropy and the law ofsmall numbers,” IEEE Trans. on Information Theory, vol. 51, no. 2,pp. 466–472, February 2005.

I. Kontoyiannis, P. Harremoes, O. Johnson and M. Madiman,“Information-theoretic ideas in Poisson approximation and concentration,”slides of a short course, September 2006.

R. J. Serfling, “Some elementary results on Poisson approximation in asequence of Bernoulli trials,” Siam Review, vol. 20, pp. 567–579, July 1978.

L. A. Shepp and I. Olkin, “Entropy of the sum of independent Bernoullirandom variables and the multinomial distribution,” Contributions to

Probability, pp. 201–206, Academic Press, New York, 1981.

Y. Yu, “Monotonic convergence in an information-theoretic law of smallnumbers,” IEEE Trans. on Information Theory, vol. 55, no. 12,pp. 5412–5422, December 2009.

I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 20, 2012. 32 / 32