Huffman Codes

27
Greedy Huffman codes R. Inkulu http://www.iitg.ac.in/rinkulu/ (Huffman codes) 1 / 15

description

huffman coding alogorithm and anlaysis

Transcript of Huffman Codes

Page 1: Huffman Codes

Greedy

Huffman codes

R. Inkulu

http://www.iitg.ac.in/rinkulu/

(Huffman codes) 1 / 15

Page 2: Huffman Codes

Encoding symbols using bits

• Given a set of symbols S, the code γ of S is a one-to-one function

γ : S → N, where each element of N is a binary number. The codeword

of a symbol x ∈ S is γ(x).

• The fixed-length code does not take frequency of occurrence of

individual symbols into account; hence, not space-efficient.

• The variable-length code helps in improving the space-efficiency: assign

longer code to less frequently used symbols and vice versa.

(Huffman codes) 2 / 15

Page 3: Huffman Codes

Prefix code

• Difficulty in decoding text with an arbitrary variable-length code:

ex. how to decode 01, when γ(a) = 0, γ(b) = 1, γ(c) = 01

• The variable-length code in which no codeword is a prefix of another istermed as a prefix code.

with code γ1(a) = 11, γ1(b) = 01, γ1(c) = 001, γ1(d) = 10, γ1(e) = 000

decoding 0010000011101 yields cecab

(Huffman codes) 3 / 15

Page 4: Huffman Codes

Optimal prefix codes

• Given a set of symbols S with their frequency of occurrences, fx for every

x ∈ S, determine a space-efficient prefix code that assigns a unique

codeword for each symbol x in S.

• The average number of bits required per letter (ABL) is∑

x∈S fx.|γ(x)|.

Hence, the objective is to choose a code that minimizes ABL.

(Huffman codes) 4 / 15

Page 5: Huffman Codes

Optimal prefix code example

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15,

• with fixed-length code, ABL is 3

• with γ1(a) = 11, γ1(b) = 01, γ1(c) = 001, γ1(d) = 10, γ1(e) = 000,

ABL(γ1) = 2.25

• with γ2(a) = 11, γ2(b) = 10, γ2(c) = 01, γ2(d) = 001, γ2(e) = 000,

ABL(γ2) = 2.23

(Huffman codes) 5 / 15

Page 6: Huffman Codes

Representing prefix codes using binary trees

e d c b

a

0 1

0

0

1

101

γ(a) = 1, γ(b) =

011, γ(c) = 010, γ(d) =

001, γ(e) = 000

e

d

c

b a

0 1

0

0 0

0

11

1

γ(a) = 11, γ(b) =

01, γ(c) = 001, γ(d) =

10, γ(e) = 000

e

b

d

ca

0 1

0

0

0

11

1

0

γ(a) = 11, γ(b) =

10, γ(c) = 01, γ(d) =

001, γ(e) = 000

• Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x ∈ S, the path from the root to the leaf

labeled x; each time the path goes from a node to its left (resp. right)

child, write down a 0 (resp. 1) to get the encoding of x.

(Huffman codes) 6 / 15

Page 7: Huffman Codes

Representing prefix codes using binary trees

e d c b

a

0 1

0

0

1

101

γ(a) = 1, γ(b) =

011, γ(c) = 010, γ(d) =

001, γ(e) = 000

e

d

c

b a

0 1

0

0 0

0

11

1

γ(a) = 11, γ(b) =

01, γ(c) = 001, γ(d) =

10, γ(e) = 000

e

b

d

ca

0 1

0

0

0

11

1

0

γ(a) = 11, γ(b) =

10, γ(c) = 01, γ(d) =

001, γ(e) = 000

• Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x ∈ S, the path from the root to the leaf

labeled x; each time the path goes from a node to its left (resp. right)

child, write down a 0 (resp. 1) to get the encoding of x.

• The encoding of S constructed from T is a prefix code.

(Huffman codes) 6 / 15

Page 8: Huffman Codes

Representing prefix codes using binary trees (cont)

e d c b

a

0 1

0

0

1

101

γ(a) = 1, γ(b) =

011, γ(c) = 010, γ(d) =

001, γ(e) = 000

e

d

c

ba

0 1

0

0 0

0

11

1

γ(a) = 11, γ(b) =

01, γ(c) = 001, γ(d) =

10, γ(e) = 000

e

b

d

ca

0 1

0

0

0

11

1

0

γ(a) = 11, γ(b) =

10, γ(c) = 01, γ(d) =

001, γ(e) = 000

• Given a prefix code, we can build a binary tree recursively.

(Huffman codes) 7 / 15

Page 9: Huffman Codes

Objective in terms of binary trees

Constructing an optimal prefix code involves

(Huffman codes) 8 / 15

Page 10: Huffman Codes

Objective in terms of binary trees

Constructing an optimal prefix code involves

• searching for a binary tree T

(Huffman codes) 8 / 15

Page 11: Huffman Codes

Objective in terms of binary trees

Constructing an optimal prefix code involves

• searching for a binary tree T

• labeling the leaves of T

(Huffman codes) 8 / 15

Page 12: Huffman Codes

Objective in terms of binary trees

Constructing an optimal prefix code involves

• searching for a binary tree T

• labeling the leaves of T

so that together they minimize ABL =∑

x∈S fx.|γ(x)| =∑

x∈S fx.depthT(x).

(Huffman codes) 8 / 15

Page 13: Huffman Codes

Objective in terms of binary trees

Constructing an optimal prefix code involves

• searching for a binary tree T

• labeling the leaves of T

so that together they minimize ABL =∑

x∈S fx.|γ(x)| =∑

x∈S fx.depthT(x).

e d c b

a

0 1

0

0

1

101

γ(a) = 1, γ(b) =

011, γ(c) = 010, γ(d) =

001, γ(e) = 000

e

d

c

ba

0 1

0

0 0

0

11

1

γ(a) = 11, γ(b) =

01, γ(c) = 001, γ(d) =

10, γ(e) = 000

e

b

d

ca

0 1

0

0

0

11

1

0

γ(a) = 11, γ(b) =

10, γ(c) = 01, γ(d) =

001, γ(e) = 000

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15, the rightmost one gives an optimal prefix code.

(Huffman codes) 8 / 15

Page 14: Huffman Codes

Optimal binary tree is full

• The binary tree corresponding to the optimal prefix code is full.

(Huffman codes) 9 / 15

Page 15: Huffman Codes

Objective in terms of full binary trees

Constructing an optimal prefix code involves

(Huffman codes) 10 / 15

Page 16: Huffman Codes

Objective in terms of full binary trees

Constructing an optimal prefix code involves

• searching for a full binary tree T

(Huffman codes) 10 / 15

Page 17: Huffman Codes

Objective in terms of full binary trees

Constructing an optimal prefix code involves

• searching for a full binary tree T

• labeling the leaves of T

(Huffman codes) 10 / 15

Page 18: Huffman Codes

Objective in terms of full binary trees

Constructing an optimal prefix code involves

• searching for a full binary tree T

• labeling the leaves of T

so that together they minimize ABL =∑

x∈S fx.|γ(x)| =∑

x∈S fx.depthT(x).

(Huffman codes) 10 / 15

Page 19: Huffman Codes

Labeling leaves of a given optimal full binary tree

• For any two leaves u and v with depth(u) < depth(v) in an optimal fullbinary tree T∗, the symbol associated with u must be more frequent thanthe symbol associated with v.

- proof using an exchange argument

(Huffman codes) 11 / 15

Page 20: Huffman Codes

Labeling leaves of a given optimal full binary tree

• For any two leaves u and v with depth(u) < depth(v) in an optimal fullbinary tree T∗, the symbol associated with u must be more frequent thanthe symbol associated with v.

- proof using an exchange argument

• With the above in place, choice of assignment of symbols among leaves

of the same depth does not affect the ABL.

(Huffman codes) 11 / 15

Page 21: Huffman Codes

Algorithm to label the leaves of a given optimal fullbinary tree

• take leaves of least depth and label them with the highest-frequency

symbols in any order

• take leaves of next least depth and label them with the highest-frequency

symbols in any order

• etc.,

(Huffman codes) 12 / 15

Page 22: Huffman Codes

Observations to construct an optimal full binary tree

• There is an optimal prefix code, with corresponding tree T∗, in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are

sibilings in T∗.

(Huffman codes) 13 / 15

Page 23: Huffman Codes

Observations to construct an optimal full binary tree

• There is an optimal prefix code, with corresponding tree T∗, in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are

sibilings in T∗.

• Let x and y be the two lowest-frequency letters. Let T ′ be a full binary

tree corresponding to an optimal prefix code for S − {y, z} ∪ {w} with

fw = fy + fz. Also, let T be the tree obtained by attaching leaves y and z

as children of node w of T ′.

Then, ABL(T) = ABL(T ′) + fw.

(Huffman codes) 13 / 15

Page 24: Huffman Codes

Huffman algorithm

• Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to

S − {y, z} ∪ {w} with fw = fy + fz. The resultant codewords for all the

symbols together is known as the Huffman code.

0 1

0 10 1

0 1

0 1

a:1/21

c:3/21

e:4/21 f:5/21

9/2112/21

6/21

3/21

21/21

d:6/21

b:2/21 (freq)

γ(a) = 0000, γ(b) = 0001, γ(c) = 001, γ(d) = 01, γ(e) = 10, γ(f ) = 11

(Huffman codes) 14 / 15

Page 25: Huffman Codes

Huffman algorithm

• Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to

S − {y, z} ∪ {w} with fw = fy + fz. The resultant codewords for all the

symbols together is known as the Huffman code.

0 1

0 10 1

0 1

0 1

a:1/21

c:3/21

e:4/21 f:5/21

9/2112/21

6/21

3/21

21/21

d:6/21

b:2/21 (freq)

γ(a) = 0000, γ(b) = 0001, γ(c) = 001, γ(d) = 01, γ(e) = 10, γ(f ) = 11

using priority queue, takes O(|S| lg |S|) time

(Huffman codes) 14 / 15

Page 26: Huffman Codes

Correctness

• feasibility: every symbol got a symbol

(Huffman codes) 15 / 15

Page 27: Huffman Codes

Correctness

• feasibility: every symbol got a symbol

• optimality: induction on the size of the alphabet

(Huffman codes) 15 / 15