Huffman Codes
-
Upload
nikhil-yadala -
Category
Documents
-
view
234 -
download
3
description
Transcript of Huffman Codes
Greedy
Huffman codes
R. Inkulu
http://www.iitg.ac.in/rinkulu/
(Huffman codes) 1 / 15
Encoding symbols using bits
• Given a set of symbols S, the code γ of S is a one-to-one function
γ : S → N, where each element of N is a binary number. The codeword
of a symbol x ∈ S is γ(x).
• The fixed-length code does not take frequency of occurrence of
individual symbols into account; hence, not space-efficient.
• The variable-length code helps in improving the space-efficiency: assign
longer code to less frequently used symbols and vice versa.
(Huffman codes) 2 / 15
Prefix code
• Difficulty in decoding text with an arbitrary variable-length code:
ex. how to decode 01, when γ(a) = 0, γ(b) = 1, γ(c) = 01
• The variable-length code in which no codeword is a prefix of another istermed as a prefix code.
with code γ1(a) = 11, γ1(b) = 01, γ1(c) = 001, γ1(d) = 10, γ1(e) = 000
decoding 0010000011101 yields cecab
(Huffman codes) 3 / 15
Optimal prefix codes
• Given a set of symbols S with their frequency of occurrences, fx for every
x ∈ S, determine a space-efficient prefix code that assigns a unique
codeword for each symbol x in S.
• The average number of bits required per letter (ABL) is∑
x∈S fx.|γ(x)|.
Hence, the objective is to choose a code that minimizes ABL.
(Huffman codes) 4 / 15
Optimal prefix code example
For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15,
• with fixed-length code, ABL is 3
• with γ1(a) = 11, γ1(b) = 01, γ1(c) = 001, γ1(d) = 10, γ1(e) = 000,
ABL(γ1) = 2.25
• with γ2(a) = 11, γ2(b) = 10, γ2(c) = 01, γ2(d) = 001, γ2(e) = 000,
ABL(γ2) = 2.23
(Huffman codes) 5 / 15
Representing prefix codes using binary trees
e d c b
a
0 1
0
0
1
101
γ(a) = 1, γ(b) =
011, γ(c) = 010, γ(d) =
001, γ(e) = 000
e
d
c
b a
0 1
0
0 0
0
11
1
γ(a) = 11, γ(b) =
01, γ(c) = 001, γ(d) =
10, γ(e) = 000
e
b
d
ca
0 1
0
0
0
11
1
0
γ(a) = 11, γ(b) =
10, γ(c) = 01, γ(d) =
001, γ(e) = 000
• Consider a binary tree T with each leaf of T is labeled with a distinct
letter in S. For each symbol x ∈ S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
(Huffman codes) 6 / 15
Representing prefix codes using binary trees
e d c b
a
0 1
0
0
1
101
γ(a) = 1, γ(b) =
011, γ(c) = 010, γ(d) =
001, γ(e) = 000
e
d
c
b a
0 1
0
0 0
0
11
1
γ(a) = 11, γ(b) =
01, γ(c) = 001, γ(d) =
10, γ(e) = 000
e
b
d
ca
0 1
0
0
0
11
1
0
γ(a) = 11, γ(b) =
10, γ(c) = 01, γ(d) =
001, γ(e) = 000
• Consider a binary tree T with each leaf of T is labeled with a distinct
letter in S. For each symbol x ∈ S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
• The encoding of S constructed from T is a prefix code.
(Huffman codes) 6 / 15
Representing prefix codes using binary trees (cont)
e d c b
a
0 1
0
0
1
101
γ(a) = 1, γ(b) =
011, γ(c) = 010, γ(d) =
001, γ(e) = 000
e
d
c
ba
0 1
0
0 0
0
11
1
γ(a) = 11, γ(b) =
01, γ(c) = 001, γ(d) =
10, γ(e) = 000
e
b
d
ca
0 1
0
0
0
11
1
0
γ(a) = 11, γ(b) =
10, γ(c) = 01, γ(d) =
001, γ(e) = 000
• Given a prefix code, we can build a binary tree recursively.
(Huffman codes) 7 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
(Huffman codes) 8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
• searching for a binary tree T
(Huffman codes) 8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
• searching for a binary tree T
• labeling the leaves of T
(Huffman codes) 8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
• searching for a binary tree T
• labeling the leaves of T
so that together they minimize ABL =∑
x∈S fx.|γ(x)| =∑
x∈S fx.depthT(x).
(Huffman codes) 8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
• searching for a binary tree T
• labeling the leaves of T
so that together they minimize ABL =∑
x∈S fx.|γ(x)| =∑
x∈S fx.depthT(x).
e d c b
a
0 1
0
0
1
101
γ(a) = 1, γ(b) =
011, γ(c) = 010, γ(d) =
001, γ(e) = 000
e
d
c
ba
0 1
0
0 0
0
11
1
γ(a) = 11, γ(b) =
01, γ(c) = 001, γ(d) =
10, γ(e) = 000
e
b
d
ca
0 1
0
0
0
11
1
0
γ(a) = 11, γ(b) =
10, γ(c) = 01, γ(d) =
001, γ(e) = 000
For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15, the rightmost one gives an optimal prefix code.
(Huffman codes) 8 / 15
Optimal binary tree is full
• The binary tree corresponding to the optimal prefix code is full.
(Huffman codes) 9 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
(Huffman codes) 10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
• searching for a full binary tree T
(Huffman codes) 10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
• searching for a full binary tree T
• labeling the leaves of T
(Huffman codes) 10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
• searching for a full binary tree T
• labeling the leaves of T
so that together they minimize ABL =∑
x∈S fx.|γ(x)| =∑
x∈S fx.depthT(x).
(Huffman codes) 10 / 15
Labeling leaves of a given optimal full binary tree
• For any two leaves u and v with depth(u) < depth(v) in an optimal fullbinary tree T∗, the symbol associated with u must be more frequent thanthe symbol associated with v.
- proof using an exchange argument
(Huffman codes) 11 / 15
Labeling leaves of a given optimal full binary tree
• For any two leaves u and v with depth(u) < depth(v) in an optimal fullbinary tree T∗, the symbol associated with u must be more frequent thanthe symbol associated with v.
- proof using an exchange argument
• With the above in place, choice of assignment of symbols among leaves
of the same depth does not affect the ABL.
(Huffman codes) 11 / 15
Algorithm to label the leaves of a given optimal fullbinary tree
• take leaves of least depth and label them with the highest-frequency
symbols in any order
• take leaves of next least depth and label them with the highest-frequency
symbols in any order
• etc.,
(Huffman codes) 12 / 15
Observations to construct an optimal full binary tree
• There is an optimal prefix code, with corresponding tree T∗, in which the
two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T∗.
(Huffman codes) 13 / 15
Observations to construct an optimal full binary tree
• There is an optimal prefix code, with corresponding tree T∗, in which the
two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T∗.
• Let x and y be the two lowest-frequency letters. Let T ′ be a full binary
tree corresponding to an optimal prefix code for S − {y, z} ∪ {w} with
fw = fy + fz. Also, let T be the tree obtained by attaching leaves y and z
as children of node w of T ′.
Then, ABL(T) = ABL(T ′) + fw.
(Huffman codes) 13 / 15
Huffman algorithm
• Recursively find two symbols y, z with lowest frequency and make them
siblings of the binary tree T to be constructed, before setting S to
S − {y, z} ∪ {w} with fw = fy + fz. The resultant codewords for all the
symbols together is known as the Huffman code.
0 1
0 10 1
0 1
0 1
a:1/21
c:3/21
e:4/21 f:5/21
9/2112/21
6/21
3/21
21/21
d:6/21
b:2/21 (freq)
γ(a) = 0000, γ(b) = 0001, γ(c) = 001, γ(d) = 01, γ(e) = 10, γ(f ) = 11
(Huffman codes) 14 / 15
Huffman algorithm
• Recursively find two symbols y, z with lowest frequency and make them
siblings of the binary tree T to be constructed, before setting S to
S − {y, z} ∪ {w} with fw = fy + fz. The resultant codewords for all the
symbols together is known as the Huffman code.
0 1
0 10 1
0 1
0 1
a:1/21
c:3/21
e:4/21 f:5/21
9/2112/21
6/21
3/21
21/21
d:6/21
b:2/21 (freq)
γ(a) = 0000, γ(b) = 0001, γ(c) = 001, γ(d) = 01, γ(e) = 10, γ(f ) = 11
using priority queue, takes O(|S| lg |S|) time
(Huffman codes) 14 / 15
Correctness
• feasibility: every symbol got a symbol
(Huffman codes) 15 / 15
Correctness
• feasibility: every symbol got a symbol
• optimality: induction on the size of the alphabet
(Huffman codes) 15 / 15