LDPC Codes: Code Construction and Encoder Hardware ... · occasions. Many thanks to my friends and...

University of Patras

School of Engineering

Department of Electrical and Computer Engineer

LDPC Codes: Code Construction and

Encoder Hardware Implementation

Κώδικες Πίνακα Ισοτιμίας Χαμηλής Πυκνότητας (LDPC):

Κατασκευή κωδίκων και υλοποιήσεις κωδικοποιητών σε υλικό

This dissertation is submitted for the degree of

Doctor of Philosophy

Ahmed S. Mahdi

Supervisor: Assoc. Professor

Vassilis Paliouras

Dissertation Number: 328

June 2015

ΠΙΣΤΟΠΟΙΗΣΗ

Πιστοποιείται ότι η παρούσα διδακτορική διατριβή με τίτλο “LDPC Codes: Code

Construction and Encoder Hardware Implementation”, (Κώδικες Πίνακα Ισοτι-

μίας Χαμηλής Πυκνότητας (LDPC): Κατασκευή κωδίκων και υλοποιήσεις κωδικοποι-

ητών σε υλικό) του Άχμεντ Μάχντι (Ahmed S. Mahdi), Διπλωματούχο Ηλεκτρο-

λόγο Μηχανικό και Τεχνολογίας Η/Υ, παρουσιάστηκε δημοσίως στο τμήμα Ηλεκτρο-

λόγων Μηχανικών και Τεχνολογίας Υπολογιστών του Πανεπιστημίου Πατρών στις

05/06/2015, εξετάστηκε και εγκρίθηκε από την ακόλουθη Εξεταστική Επιτροπή:

• Βασίλης Παλιουράς, Αναπληρωτής Καθηγητής, τμήμα Ηλεκτρολόγων Μηχανι-

κών και Τεχνολογίας Υπολογιστών, Πανεπιστήμιο Πατρών.

• Wonyong Sung, Professor, National Seoul University.

• Αλέξιος Μπίρμπας, Καθηγητής, τμήμα Ηλεκτρολόγων Μηχανικών και Τεχνολο-

γίας Υπολογιστών, Πανεπιστήμιο Πατρών.

• Οδυσσέας Κουφοπαύλου, Καθηγητής, τμήμα Ηλεκτρολόγων Μηχανικών και Τε-

χνολογίας Υπολογιστών, Πανεπιστήμιο Πατρών.

• Κωνσταντίνος Μπερμπερίδης, Καθηγητής, τμήμα Μηχανικών Η/Υ και Πληρο-

φορικής, Πανεπιστήμιο Πατρών.

• Θάνος Στουραΐτης, Καθηγητής, τμήμα Ηλεκτρολόγων Μηχανικών και Τεχνολο-

γίας Υπολογιστών, Πανεπιστήμιο Πατρών.

• Μιχαήλ Μπίρμπας, Επίκουρος Καθηγητής, τμήμα Ηλεκτρολόγων Μηχανικών

και Τεχνολογίας Υπολογιστών, Πανεπιστήμιο Πατρών.

Πάτρα, 05/06/2015Ο Επιβλέπων Καθηγητής

Βασίλης Παλιουράς

Αναπληρωτής Καθηγητής

Ο Προέδρος του Τμήματος

Γαβριήλ Γιαννακόπουλος

Καθηγητής

to my loving families in Palestine and Greece

Acknowledgements

First and foremost, I am truly indebted and wish to express my gratitude to my

supervisor Asocc. Professor Vassilis Paliouras for his inspiration, excellent guidance,

continuing encouragement and unwavering confidence and support during every stage

of this endeavour without which, it would not have been possible for me to complete

this undertaking successfully. I also thank him for his insightful comments and

suggestions which always helped me to improve my understanding. I express my

deep gratitude to the members of the Advisory Committee, Prof. Wonyong Sung

from the National Seoul University, and Prof. Alexis Birbas from the University of

Patras, for their useful advices and support. Also I would like to express my heartfelt

gratitude to my wife L. X. Angelou who kept me in focus and helped a lot on several

occasions. Many thanks to my friends and colleagues at the VLSI Lab of Un. of

Patras who have inspired me and particularly helped in the works of this thesis. My

wholehearted gratitude to my father, my brothers and my friends for their constant

love, encouragement, and support.

Contents

Contents ix

List of Figures xiii

List of Tables xvii

Nomenclature xviii

1 Introduction 1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Review of the state of the art 7

2.1 Wireless communications and channel capacity . . . . . . . . . . . . . . 7

2.2 Channel coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Role of coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Linear block codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Low Density Parity Check Codes . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Tanner graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.2 The encoding process . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.3 The decoding process . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.4 Rate compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 A Low Complexity - High Throughput QC-LDPC Encoder 23

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 QC-LDPC encoding background . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 LDPC encoding using PCM . . . . . . . . . . . . . . . . . . . . . . 25

3.2.2 Code construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Proposed encoder: memory compression scheme and hardware architecture 29

x Contents

3.3.1 Serial encoder architecture . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Shared serial encoder architecture . . . . . . . . . . . . . . . . . . 42

3.3.3 Parallel encoder architecture . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Evaluation of the proposed encoders . . . . . . . . . . . . . . . . . . . . . 50

3.4.1 Impact of LU decomposition of (HT2 )−1 . . . . . . . . . . . . . . . . 50

3.4.2 Impact of multi-level memory compression . . . . . . . . . . . . . 52

3.4.3 Impact of parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4.4 Joint consideration of LU decomposition and sub-expression shar-

ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.5 Implementation Results and Comparisons to the state-of-the-art . . . . . 58

4 Simplified QC-LDPC Codes for Low-Complexity Encoders 69

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3 LDPC Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3.1 Structure of ML-QC-LDPC codes . . . . . . . . . . . . . . . . . . . 74

4.4 The proposed matrix extension . . . . . . . . . . . . . . . . . . . . . . . . 78

4.4.1 Structure of the inverse matrix . . . . . . . . . . . . . . . . . . . . 79

4.4.2 Proposed Matrix Construction Method . . . . . . . . . . . . . . . . 81

4.5 Complexity and BER performance . . . . . . . . . . . . . . . . . . . . . . 82

5 On the Encoding Complexity of QC-LDPC Codes 87

5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2 Review of Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.3 Proposed parity-check matrix construction . . . . . . . . . . . . . . . . . . 92

5.3.1 Constraint-based base shift matrix H2bsh construction . . . . . . . 93

5.3.2 Straightforward construction for polynomial-free inverted matrices 101

5.4 Performance of the proposed construction method . . . . . . . . . . . . . 104

6 Efficient Implementation of WiFi LDPC Encoder 113

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.1.1 Common Sub-expression Sharing . . . . . . . . . . . . . . . . . . . 114

6.2 A Single Encoding Procedure for Several LDPC Codes . . . . . . . . . . 115

6.2.1 Common Encoder for Several LDPC codes . . . . . . . . . . . . . 115

6.2.2 Illustrative encoding example of several LDPC codes . . . . . . . 118

6.3 Common encoder components determination algorithm . . . . . . . . . . 123

6.4 Encoder Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Contents xi

6.4.1 Encoder Interface Architecture . . . . . . . . . . . . . . . . . . . . 124

6.4.2 Encoding Core Architecture . . . . . . . . . . . . . . . . . . . . . . 125

6.5 Evaluation of the proposed architecture and comparison with prior art . 126

7 Hardware-efficient rate-compatible QC-LDPC codes 131

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.2 Review of Puncturing and Shortening . . . . . . . . . . . . . . . . . . . . 132

7.3 Proposed Puncturing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 134

7.3.1 MacKay Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7.3.2 Exploiting MacKay encoding to support rate adaptability . . . . . 136

7.3.3 Proposed Matrix Puncturing Algorithm . . . . . . . . . . . . . . . 137

7.3.4 Supported Code Rates . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.4 Proposed Extension to QC-LDPC Codes . . . . . . . . . . . . . . . . . . . 138

7.4.1 QC-LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.4.2 Modified MacKay Encoding for QC-LDPC codes . . . . . . . . . . 139

7.4.3 Matrix puncturing for QC-LDPC codes . . . . . . . . . . . . . . . 141

7.5 Proposed Parity-Check Matrix Construction . . . . . . . . . . . . . . . . . 141

7.6 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.6.1 Hardware Architecture of an RC Encoder . . . . . . . . . . . . . . 145

7.6.2 Support of Rate Compatibility . . . . . . . . . . . . . . . . . . . . . 148

7.6.3 Complexity of the RC Encoder . . . . . . . . . . . . . . . . . . . . 149

7.7 Performance of Proposed System . . . . . . . . . . . . . . . . . . . . . . . 152

7.7.1 BER performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.7.2 Hardware Requirements of a RC QC-LDPC Encoder . . . . . . . 154

7.7.3 Evaluation of the proposed RC-QC-LDPC encoder and the ob-

tained BER performance . . . . . . . . . . . . . . . . . . . . . . . . 155

8 Conclusions and future trends 159

Bibliography 167

Appendix A 181

List of Figures

3.1 BER performance of codeA, (2016, 1512), and code1/2 (2016, 1008) using

normalized min-sum decoder with normalization factor a = 0.75 and

NOI = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Overview of the proposed hardware encoder architecture. . . . . . . . . . 30

3.3 Compressed representation of HT1 . This organization allow better control

than having different memories for each content, Index, Shifting factor

or EOC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 First-stage VMM Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Hierarchical compression of multi-level QC-LDPC codes. mbase is num-

ber of rows of Hbase and EOCbase is the End-Of-Column flag. . . . . . . 34

3.6 Architecture of index computation circuit for multi-level HT1 . . . . . . . 36

3.7 The unique types of the nonzero sub-matrices in (HT2 )−1, L and U for z = 4. 38

3.8 Compressed representation of matrix L. The occurrence of polynomial

sub-matrices requires storing relative informations about them. A com-

pressed representation of U has identical structure. . . . . . . . . . . . . 39

3.9 VMM Unit of stages 2 and 3. The units which multiply the row vectors

with L and U have identical structure. They differ only in the contents

of the ROMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.10 Encoder architecture with one shared VMM unit. . . . . . . . . . . . . . 42

3.11 A parallel implementation of the first step comprises identical instances

of the same unit. Each instance accesses a ROM containing data re-

quired by the particular instance only. Similarly for the second and

third steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.12 Organization of shared-parallel architecture. . . . . . . . . . . . . . . . . 45

3.13 Shared parallel architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.14 Full parallel core based on XOR logic. Inputs and outputs are stored in

shift registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xiv List of Figures

3.15 Memory requirements vs. the code word length. Complexity remains

linear even with the additive hardware required to implement encoders

with larger extension factor z. . . . . . . . . . . . . . . . . . . . . . . . . 57

3.16 BER performance of a (2304, 1728) recursively-constructed QC-LDPC

code compared to the corresponding Wimax code using normalized min-

sum decoder with normalization factor a = 0.75 and NOI = 10. . . . . . . 63

4.1 Representation of Matrix H, where n is the number of H-columns. . . . 75

4.2 Structure of H2bb and H−12bb

of Wimax (′3/4A′) with 13 and 27 nonzero

elements respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 BER performance of Wimax (′3/4A′) and various structurally randomly

shifted PCMs. A zi

List of Figures xv

5.8 BER performance of the modified (M) QC-LDPC codes vs. the original

(O) corresponding codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.9 Decoding Average Number Of Iterations (ANOI) of the modified (M)

QC-LDPC codes vs. the original (O) corresponding codes. . . . . . . . . 109

5.10 BER performance of (2016, 1512) Wimax (′3/4A′) and various struc-

turally randomly shifted ML-PCMs. A zi

xvi List of Figures

7.4 Compressed representation of the contents of matrix C−1 which has x

columns. T is the number of types τ. . . . . . . . . . . . . . . . . . . . . . 148

7.5 Architecture of the proposed encoder. . . . . . . . . . . . . . . . . . . . . 149

7.6 Encoding clock cycles as a function of code rate. . . . . . . . . . . . . . . 151

7.7 RC Encoder energy consumption vs. the code rate. . . . . . . . . . . . . 152

7.8 Decoding performance of puncturing schemes. . . . . . . . . . . . . . . . 153

7.9 Decoding performance of Vellambi-Fekri puncturing scheme for a (2016, 1008)

code constructed using PEG. . . . . . . . . . . . . . . . . . . . . . . . . . 154

7.10 Constructing RC codes by extension of the parity-check matrix [109]. . 156

7.11 BER performance of the RC codes of [109] with information block length

of 1k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

List of Tables

3.1 Computation of p = s ·HT1·L ·U . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Shared VMM complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Complexity comparison between various LDPC codes for H(504×2016) and

code rate 0.75 based on z(2) = 4. An entry (a×b) means that multiplica-tion with the particular matrix requires a clock cycles, while (a×b) bitsof memory are required to store the matrix. Memory reduction shows

the benefit of using L and U instead of (HT2 )−1. . . . . . . . . . . . . . . . 52

3.4 Complexity comparison between serial and parallel architectures. . . . . 55

3.5 Comparisons between the required XOR gates for multiplications by

(HT2 )−1, L and U of codeA according to the adopted multiplication tech-

nique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.6 Occupied slices and achieved frequency of serial architectures of various

codewords lengths. z is the size of last extension step (second step). Cc

is the number of required clock cycles, and IT and BT are the achieved

Information and Block Throughput, respectively. . . . . . . . . . . . . . . 58

3.7 Performance of fully parallel architecture. The computed throughput

excludes loading and transmission time. . . . . . . . . . . . . . . . . . . . 59

3.8 A view of the state of the art. . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.9 Throughput-to-Area Ratio (TAR) of the proposed architectures compared

to the TAR of the corresponding architectures of prior art. . . . . . . . . 60

4.1 Area complexity and critical path delay . . . . . . . . . . . . . . . . . . . . . . . 85

5.1 Inverted matrix derivation using Gaussian Elimination. Executed oper-

ations follow the pre-reported guidelines, and ri denotes the ith row of

the corresponding matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2 Impact of the proposed method on area complexity. . . . . . . . . . . . . . 107

xviii List of Tables

5.3 Impact of the proposed H2504×504 construction method for ML – QC-LDPC

codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.1 WiFi encoding core hardware requirements. Shared (HT2

)−1 indicates

the XOR tree based on sub-expression sharing. . . . . . . . . . . . . . . . 128

6.2 ASIC area requirements for two different CMOS technologies, and op-

eration frequency for the proposed IEEE 802.11n/ac LDPC encoder. . . . 129

7.1 FPGA Area Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Abstract

B the key factor for development as a society, electronic communications have

increasingly became more and more the center of our day to day life. Conse-

quently, methods and techniques that ensure reliable, safe and fast transmission of

information are becoming more and more essential. Digital Signal Processing, Infor-

mation Theory and Error Correction Codes are the research areas that study how to

achieve such a goal. Many error correction codes have been presented in the past

but in recent years a class of codes has imposed itself as the best candidate to solve

the problem, Low Density Parity Check (LDPC) Codes. LDPC codes are a class of

codes that can achieve reliable communication while keeping the complexity of the

encoder and decoder implementation controllable. The performances of LDPC codes

have been shown to be very close to the theoretical limit that a code can reach in

a given channel, the channel capacity. However, implementation of LDPC codes of

very good performance and at the same time, very low complexity, remains a chal-

lenge. There is always a need for smaller and faster communication devices with low

power consumption and low cost, as a consequence, error correction modules must be

improved to meet these specifications covering at the same time all required function-

alities.

This thesis investigates various aspects of LDPC code construction/implementation

that can improve their performance and decrease their hardware implementation re-

quirements. The main focus of the thesis is on code construction and encoder hard-

ware implementation where either by adopting known code construction methods or by

modifying them, known encoding algorithms or by modifying them, and by proposing

new construction techniques, we always jointly consider the error-corrective capability

of the constructed code and their complexity; since when real systems are designed,

the choice of the code used cannot be based exclusively on the coding performances

but also hardware requirements must be considered. Hence, it is important to de-

velop codes that are capable of good performances without making the encoding and

decoding processes intractable from an implementation point of view.

Περίληψη

Τ τεχνολογικά επιτεύγματα του 20ου αιώνα συνέβαλαν σημαντικά στη βελτίω-

ση της ποιότητας ζωής του ανθρώπου καθώς διευκόλυναν την εξυπηρέτηση

των καθημερινών αναγκών του. ´Ενας από τους σημαντικότερους παράγοντες της

καθημερινότητας είναι οι κάθε μορφής επικοινωνίες, οι οποίες έχουν παρουσιάσει

εξαιρετική πρόοδο τις τελευταίες δεκαετίες. ´Ετσι λοιπόν, καθημερινά κάνουν την

εμφάνιση τους ολοένα και νεότερες υπηρεσίες όσον αφορά τις επικοινωνίες, ενώ

παράλληλα οι απαιτήσεις για γρήγορη, αδιάλειπτη και αξιόπιστη μετάδοση πλη-

ροφορίας σε πραγματικό χρόνο συνεχίζουν να αυξάνονται. Αυτή η ϑεαματική και

εξελισσόμενη πρόοδος στον τομέα των τηλεπικοινωνιών οφείλεται σε μεγάλο βαθ-

μό στη συστηματική πρόοδο που παρουσιάζεται σχετικά με την απόδοση αλλά και

στο χαμηλό κόστος των συσκευών και κυκλωμάτων που χρησιμοποιούνται. Αυτή

η πρόοδος προκύπτει επίσης και από τις εξελίξεις που λαμβάνουν χώρα σε ϑεω-

ρητικό επίπεδο. Ακρογωνιαίος λίθος των σύγχρονων τηλεπικοινωνιακών υποδομών

ϑεωρείται η συνεργασία (συγχρονισμός) μεταξύ των διάφορων στοιχείων από τα ο-

ποία αποτελείται ένα τηλεπικοινωνιακό σύστημα και των τεχνικών επεξεργασίας

σημάτων. Η ανακάλυψη του τρανζίστορ κατέστησε πιο εύκολη την επεξεργασία της

πληροφορίας, καθώς με τα Πολύ Μεγάλης Κλίμακας Ολοκληρωμένα (Very Large

Scale Integrated, VLSI) κυκλώματα, η επεξεργασία των σημάτων γίνεται σχεδόν α-

ποκλειστικά με ψηφιακό τρόπο.

Στα ψηφιακά συστήματα, η πληροφορία κωδικοποιείται σε ακολουθίες από 0

και 1, που αντιστοιχούν στις δύο δυνατές καταστάσεις των τρανζίστορ, on - off, τα

οποία λειτουργούν ως διακόπτες. Το πλεονέκτημα αυτό έχει επιφέρει σημαντικές

αλλαγές στον τρόπο επεξεργασίας της πληροφορίας. Η πληροφορία στη φύση πα-

xxii List of Tables

ρουσιάζεται αποκλειατικά σε αναλογική μορφή, καθώς ο άνθρωπος μόνο αναλογικά

σήματα μπορεί να αντιληφθεί. ´Ετσι λοιπόν, τα αναλογικά δεδομένα πρέπει να ψη-

φιοποιηθούν, να μετατραπούν δηλαδή σε ακολουθίες από 0 και 1, ώστε ο δέκτης να

μην είναι υποχρεωμένος να κάνει μία εκτίμηση των άπειρων τιμών ενός αναλογικού

σήματος, αλλά απλά να πάρει μία απόφαση μεταξύ των δύο διακριτών τιμών για

κάθε σήμα, 0 ή 1. Η διαδικασία αυτή καθιστά τα ψηφιακά σήματα πιο αξιόπιστα

για τη μετάδοση πληροφορίας σε ένα ενθόρυβο περιβάλλον, καθώς μπορούν να ανι-

χνεύονται σχεδόν τέλεια, όταν το επίπεδο του ϑορύβου δεν είναι ιδιαίτερα υψηλό,

πράγμα που επιτρέπει την ανάκτηση των ψηφιακών στοιχείων, και μέσω των τεχνι-

κών διόρθωσης λαθών είναι δυνατή η διόρθωση σφαλμάτων που συμβαίνουν κατά τη

μετάδοση.

Η ψηφιακή πληροφορία μπορεί να κωδικοποιηθεί με τέτοιο τρόπο ώστε να ει-

σάγονται σε αυτήν επιπρόσθετα δυαδικά ψηφία που δε μεταφέρουν πληροφορία

και ονομάζονται πλεονασμός (redundancy). Τα επιπρόσθετα ψηφία επιτρέπουν στον

δέκτη να αναγνωρίσει τα σφάλματα που τυχόν προέκυψαν κατά τη μετάδοση. Η

τεχνική αυτή ονομάζεται Κωδικοποίηση Ελέγχου Σφάλματος (Error Control Coding).

´Ενα ακόμα πλεονέκτημα της ψηφιακής επεξεργασίας της πληροφορίας είναι ο ευκο-

λότερος σχεδιασμός των αλγορίθμων που απαιτούνται σε σχέση με τους αλγορίθμους

επεξεργασίας αναλογικών σημάτων. Συνδυάζοντας λοιπόν τα παραπάνω, μπορο-

ύμε να πούμε ότι το χαμηλό κόστος των VLSI, η εύκολη εφαρμογή των αλγορίθμων

ψηφιακής επεξεργασίας σήματος σε αυτά, αλλά και η πληθώρα των τεχνικών ελέγ-

χου και διόρθωσης σφάλματος, έχουν οδηγήσει σε πολλές πρακτικές εφαρμογές του

ελέγχου λάθους. Αξιοσημείωτη επίδραση στην εξέλιξη και ανάπτυξη των τηλεπικοι-

νωνιακών συστημάτων έχει διαδραματήσει η επιστήμη της Θεωρίας της Πληροφορίας

της οποίας τα ϑεμέλια τοποθέτησε ο Claude Shannon το 1948 [121, 122]. Η ϑεωρία

του C. Shannon αποδεικνύει ότι υπάρχει κατάλληλος κώδικας διόρθωσης λαθών για

την αξιόπιστη μετάδοση της πληροφορίας μέσω ενός ενθόρυβου καναλιού, αρκεί ο

ρυθμός μετάδοσης δεδομένων, rb ,να είναι μικρότερος από τη μέγιστη χωρητικότητα

του διαύλου, C.

List of Tables xxiii

Η ϑεωρία του Shannon έδωσε το έναυσμα για την αναζήτηση τεχνικών κωδι-

κοποίησης/αποκωδικοποίησης, δηλαδή κωδίκων διόρθωσης σφαλμάτων (Error Cor-

rection Codes), που να προσεγγίζουν τη μέγιστη χωρητικότητα του διαύλου. Μια

πρώτη προσπάθεια προσέγγισης του ϑεωρητικού ορίου του Shannon έγινε στις αρ-

χές της δεκαετίας του ´60 και συγκεκριμένα το 1963, από τον R. G. Gallager, ο

οποίος παρουσίασε τους Κώδικες Χαμηλής Πυκνότητας Ελέγχου Ισοτιμίας (Low-

Density Parity Check (LDPC) codes) [41]. Η κατηγορία αυτή κωδίκων βασίστηκε σε

αλγορίθμους κωδικοποίησης/αποκωδικοποίησης των οποίων η πολυπλοκότητα ήταν

ανώτερη των δυνατοτήτων των υπολογιστικών μηχανών της εποχής εκείνης και για

το λόγο αυτό τέθηκαν στο περιθώριο μέχρι τις αρχές της δεκαετίας του ´90. Τότε

έκαναν την εμφάνισή τους οι πρώτοι κώδικες των οποίων η απόδοση άγγιζε το όριο

του Shannon [28].

Το αντικείμενο και η συνεισφορά της διατριβής

Η διατριβή επικεντρώνεται στην κατασκευή αποδοτικών κωδίκων LDPC και στη με-

λέτη της κωδικοποίησης για κώδικες LDPC. Συγκεκριμένα, στην παρούσα διατριβή

ακολουθείται μια σχεδιαστική μεθοδολογία βασισμένη στην από κοινού κατασκευή

και υλοποίηση σε υλικό των κωδίκων LDPC και των κωδικοποιητών, αντίστοιχα.

Στόχος είναι να αναπτυχθούν μεθοδολογίες κατασκευής κωδίκων LDPC με μεγάλη

ικανότητα διόρθωσης λαθών, ιδιαίτερα σε χαμηλό BER , οι οποίοι να ικανοποιο-

ύν προδιαγραφές συγκεκριμένων εφαρμογών [69], αλλά και να είναι κατάλληλοι για

χρήση και σε άλλες εφαρμογές με παρόμοιες απαιτήσεις σε διορθωτική ικανότητα.

Ταυτοχρόνως, υλοποιούνται σε υλικό διάφορες αρχιτεκτονικές κωδικοποιητών, οι

οποίες επίσης ικανοποιούν ένα αρκετά μέγαλο εύρος από προδιαγραφές αρχιτεκτο-

νικών, όπως υψηλό ρυθμό επεξεργασίας και μετάδοσης δεδομένων (Throughput rate),

της τάξεως των πολλών Gbps, χαμηλή χρονική πολυπλοκότητα με μικρή επιφάνεια

των κυκλωμάτων VLSI, χρήση μικρών μνημών και χαμηλή κατανάλωση ισχύος, αλλά

και υποστήριξη κωδικοποίησης κωδίκων με πολλαπλούς ρυθμούς χρησιμοποιώντας

xxiv List of Tables

είτε ένα μόνο κώδικα από τον οποίο παράγονται και άλλοι κώδικες με διαφορετικο-

ύς ρυθμούς, είτε χρησιμοποιώντας πολλούς κώδικες, ο καθένας από τους οποίους

υποστηρίζει ένα συγκεκριμένο ρυθμό.

Στα πλαίσια της διατριβής, μελετούνται τα προβλήματα και η πολυπλοκότητα

κωδικοποίησης κωδίκων LDPC μαζί με την διορθωτική ικανότητά τους. ´Εμφαση

δίνεται στη κατασκευή και υλοποίηση των κωδίκων που ανήκουν στη κατηγορία

των QC-LDPC [39]. Οι κώδικες αυτοί χρησιμοποιούνται σχεδόν σε όλα τα διεθνή

πρωτόκολλα ψηφιακών τηλεπικοινωνιών IEEE Standards, όπως το δορυφορικό Dig-

ital Video Broadcast (DVB) DVB-S2 [3], IEEE 802.3an (10GBASE-T) [5], IEEE 802.16

(WiMAX) [4], IEEE 802.11n/ac (WiFi), αλλά και σε πολλές άλλες εφαρμογές.

Για την κατασκευή αποδοτικών κωδίκων QC-LDPC με χαμηλό error-floor, υιοθε-

τούνται οι κώδικες QC-LDPC με πολλαπλές επεκτάσεις Multi-level (ML) QC-LDPC

codes [96]. Οι κώδικες αυτοί υπακούν στους ίδιους κανόνες κατασκευής κωδίκων

με αυτούς των QC-LDPC με μία μόνο επέκταση· η κατασκευή ενός τέτοιου κώδικα

ξεκινάει από ένα μικρό δυαδικό πίνακα βάσης με διαστάσεις mb × nb, με δεδομένη

πυκνότητα (όριζεται από τον αριθμό των άσσων) και δεδομένη κατανομή αυτών

των άσσων, δεδομένη μορφή του πίνακα και δεδομένο ρυθμό κώδικα. ´Ετσι, για

την κατασκευή του επεκταμένου Πίνακα Ελέγχου Ισοτιμίας Hm×n, κάθε μηδέν του

πίνακα βάσης αντικαθίσταται με ένα τετραγωνικό μηδενικό υποπίνακα, και κάθε

άσσος αντικαθίσταται με ένα μοναδιαίο υποπίνακα (μη-μηδενικό) ολισθημένο με ένα

τυχαίο συντελεστή ολίσθησης σ, όπου 0 6 σ < z, και z είναι το μέγεθος των τετρα-

γωνικών μηδενικών και μη-μηδενικών υποπινάκων. Στην περίπτωση των κωδίκων

ML-QC-LDPC, ο τελικός πίνακας H δημιουργείται επεκτείνοντας αναδρομικά τον

ήδη επεκταμένο πίνακα χρησιμοποιώντας κατάλληλο z σε κάθε επέκταση, έτσι ώστε

να επιτευχθεί ο επιθυμητός πίνακας H. Η διαδικασία αυτή περιγράφεται στη Κε-

φάλαιο 3.2.2. Οι κώδικες ML-QC-LDPC που κατασκευάστηκαν δείχνουν να έχουν

δυνατή διορθωτική ικανότητα με πολύ χαμηλό error-floor, πετυχαίνοντας BER της

τάξης των 10−12 στα 4.5 db, για κωδική λέξη με μήκος 2016 και με ρυθμό κώδικα 3/4.

Ο κώδικας αυτός, αλλά και οι υπόλοιποι κώδικες που κατασκευάστηκαν, έχουν για

List of Tables xxv

πίνακες βάσης τους αντίστοιχους πίνακες βάσης του IEEE 802.16 (WiMAX), παρ´

όλα αυτά, η διορθωτική ικανότητα των προτεινόμενων κωδίκων δείχνει να είναι αρ-

κετά πιο δυνατή από τους αντίστοιχους κώδικες του IEEE 802.16 (WiMAX), όπως

φαίνεται στο Σχήμα 3.1.

Για την υλοποίηση του κωδικοποιητή, προτείνεται ο αλγόριθμος που περιγράφε-

ται στη (3.3), όπου για τον υπολογισμό των ψηφίων ελέγχου p, ακολοθούνται δύο

βήματα· το πρώτο εκτελεί πολλαπλασιασμό των ψηφίων πληροφορίας με τον πίνακα

HT1 (υποπίνακα του H), παράγοντας ένα ενδιάμεσο διάνυσμα (p1), ενώ στο δεύτερο

εκτελεί πολλαπλασιασμό του διανύσματος (p1) με τον πίνακα (HT2 )−1. Ο αλγόριθ-

μος αυτός επιλέχθηκε για υλοποίηση και μελέτη καθώς είναι απλός στην υλοποίηση

σε υλικό ακόμα και στη περίπτωση της παράλληλης αρχιτεκονικής, σε σχέση με

τους άλλους αλγορίθμους κωδικοποίησης. Επίσης, η μελέτη του πίνακα (HT2 )−1 και

η απλοποίηση των πράξεων με αυτόν τον πυκνό αντιστραμένο πίνακα, έτσι ώστε

να μειωθεί η πυκνότητα του και να μειωθεί το αντίστοιχο απαιτούμενο υλικό, ϑα

μπορούσαν να εφαρμοστούν αποδοτικά στην απλοποίηση των υπόλοιπων αλγορίθ-

μων κωδικοποίησης και τις αντίστοιχες αρχιτεκτονικές που περιέχουν πράξεις με

αντιστραμένους πίνακες.

Οι τεχνικές που προτείνονται για την μείωση της υπολογιστικής πολυπλοκότητας

του κωδικοποιητή περιγράφονται στο Κεφάλαιο 3, και συνοψίζονται ως εξής:

* Αποθήκευση τις ϑέσεις των μη-μηδενικών υποπινάκων του πίνακα HT1 και των

αντίστοιχων συντελεστών ολίσθησης. Στην περίπτωση των κωδίκων ML-QC-

LDPC, αποθηκεύονται οι ϑέσεις των μη-μηδενικών στοιχείων του μικρού πίνακα

βάσης του πίνακα HT1

και τους αντίστοιχους συντελεστές ολίσθησης της κάθε

επέκτασης, όπως φαίνεται στο Σχήμα 3.5.

* Ο πολλαπλαιασμός με τον πίνακα HT1 εκτελείται χρησιμοποιώντας μια μονάδα

κυκλικής ολίσθησης, η οποία εκτελεί την ολίσθηση σε ένα κύκλο ρολογιού

μόνο, όπως φαίνεται στο Σχήμα 3.4.

* Η αντιστροφή του πίνακα HT2 έχει ως αποτέλεσμα την αποδιαμόρφωση της

xxvi List of Tables

δομής του αντιστραμένου πίνακα (HT2

)−1 , αλλά και την αύξηση της πυκνότητάς

του. Για τον δεύτερο βήμα του αλγορίθμου προτείνονται τα εξής:

* Μείωση της πυκνότητάς του με την χρήση της παραγοντοποίησής του σε

Κάτω (L) και Άνω (U) Τριγωνικούς Πίνακες (LU Decomposition).

* Κατά τη μελέτη των πινάκων (HT2 )−1, L και U βρέθηκε ότι οι μη-μηδενικοί

υποπίνακες είναι κυκλικοί υποπίνακες που περιέχουν ένα άσσο (μονα-

διαίοι) ή παραπάνω (πολυώνυμα), ανά γραμμή (ή στήλη). Οι υποπίνακες

αυτοί προκύπτουν από ένα μικρό σύνολο από κυκλικούς υποπίνακες που

επαναλαμβάνονται με διαφορετικές ολισθήσεις. ´Ετσι, για την αποθήκευση

των L και U, αποθηκεύεται η πρώτη γραμμή (ή στήλη) των υποπινάκων

που ανήκουν στο μικρό συνόλο, και τους συντελεστές ολίσθησης που

αντιστοιχούν στους υπόλοιπους υποπίνακες των L και U.

* Ο πολλαπλασιαμός στο δεύτερο βήμα κωδικοποίησης χωρίζεται σε δύο

στάδια· στο πρώτο πολλαπλασιάζεται το διάνυσμα (p1) με τον πίνακα L

και στη συνέχεια, το αποτέλεσμα πολλαπλασιάζεται με τον πίνακα U.

* Λόγω των πολυωνύμων, προτείνται κύκλωμα πολλαπλασιασμού διανύσμα-

τος με πίνακα. Το κύκλωμα αυτό υλοποιείαι με πύλες AND και XOR, και

ακολουθείται από μία μονάδα κυκλικής ολίσθησης. Η αποθήκευση των

πινάκων L και U και οι πολλαπλασιασμοί μαζί τους περιφράφονται σχη-

ματικά στο Σχήμα 3.9

* Οι τρεις παραπάνω πολλαπλιασμοί μπορούν να υλοποιηθούν με ένα κύκλωμα

όπου οι είσοδοί του ϑα ελέγχονται με πολυπλέκτες σύμφωνα με το εκτελε-

σθέντα βήμα, όπως περιγράφεται στο Σχήμα 3.10.

* Προτείνονται επίσης διάφορες παράλληλες αρχιτεκτονικές για αύξηση του ρυθ-

μού επεξεργασίας, μία εκ των οποίων εκτελεί τους τρείς πολλαπλασισμούς

σε ένα κύκλο ρολογιού χρησιμοποιώντας ένα δέντρο από πύλες XOR, Fully-

Parallel, όπως φαίνεται στο Σχήμα 3.14. Ο αριθμός των XOR εξαρτάται από

List of Tables xxvii

τον αριθμό των άσσων των εμπλεκομένων πινάκων.

Στη συνέχεια της διατριβής, προτείνονται δύο τεχνικές μείωσης της πυκνότητας

του πίνακα (HT2 )−1 αλλά και απλοποίησης της δομής του, με σκοπό την απλοποίηση

της αρχιτεκτονικής του κωδικοποιητή και μείωση της πολυπλοκότητάς του. Οι τεχνι-

κές αυτές εφαρμόζονται πάνω στον πίνακα βάσης που περιγράφει τους συντελεστές

ολίσθησης του πίνακα H2 με σκοπό την εξάλειψη των πολυωνύμων που προκύπτουν

κατά την αντιστροφή του πίνακα, έτσι ώστε οι υποπίνακες που ϑα προκύψουν να

είναι όλοι μοναδιαίοι (ολισθημένοι) και ο αριθμός τους να είναι ίδιος με τον αριθμό

των μη-μηδενικών στοιχείων του αντίστοιχου αντιστραμένου δυαδικού πίνακα βάσης

του H2. Συγκεκριμένα:

* Στην πρώτη τεχνική, δίνονται ίδιοι συντελεστές ολίσθησης σε όλα τα μη-

μηδενικνά στοιχεία ανά γραμμή (ή στήλη) του πίνακα βάσης των συντελεστών

ολίσθησης. Η τεχνική αυτή έχει ως αποτέλεσμα την απλοποίηση του πίνακα

(HT2 )−1 και τη μείωση της πολυπλοκότητας πολλαπλασιασμού μαζί του, όπως

περιγράφεται στο Κεφάλαιο 4. Μέχρι ένα σημείο, η διορθωτική ικανότητα του

κώδικα που προκύπτει είναι ίδια με εκείνη του κώδικα, όπου οι συντελεστές

ολίσθσης του ορίζονται εντελώς τυχαία, έχοντας ίδια καμπύλη του BER μέχρι

το σημείο 10−8. Ωστόσο, παρουσιάζεται υποβάθμιση του BER κάτω από το

σημείο 10−8 (περιοχή του error-floor) όπως φαίνεται στο Σχήμα 4.3. Η υποβάθ-

μιση αυτή φαίνεται να οφείλεται στον περιορισμό της τυχαιότητας των τιμών

που δίνονται στους συντελεστές ολίσθησης.

* Στην δεύτερη τεχνική, γίνεται προσπάθεια αύξησης της τυχαίοτητας των συ-

ντελεστών ολίσθησης του πίνακα H2, όπως περιγράφεται στο Κεφάλαιο 5.

Προτείνεται μία μεθοδολογία κατασκευής του πίνακα βάσης που περιγράφει

τους συντελεστές ολίσθησης· τα πολυώνυμα φαίνεται να προκύπτουν λόγω

της πρόσθεσης μοναδιαίων ολισθημένων υποπινάκων, με διαφορετικούς συντε-

λεστές ολίσθησης, κατά την αντιστροφή του πίνακα H2. Στη προτεινόμενη

τεχνική, εντοπίζοται αυτοί οι συντελεστές που οδηγούν σε πολυώνυμα και

xxviii List of Tables

τροποποιούνται κατάλληλα, έτσι ώστε κατά την αντιστροφή του H2, οποιοσ-

δήποτε υποπίνακας που προκύπτει ϑα πρέπει να είναι είτε μηδενικός είτε

μοναδιαίος (ολισθημένος). Για τη μελέτη της αντιστροφής του πίνακα και ε-

ξάλλειψης των πολυωνύμων, χρησιμοποιείται η μέθοδος blockwise matrix in-

version.

* Η δεύτερη τεχνική παρέχει κώδικες με αρκετά βελτιωμένο BER στην περιοχή

του error-floor σε σχέση με την πρώτη καθώς λίγοι είναι συντελεστές ολίσθησης

που δεν ορίζονται εντελώς τυχαία, όπως φαίνεται στο Σχήμα 5.11.

´Ενα άλλο σημαντικό ϑέμα κωδικοποίησης που ερευνάται στη διατριβή είναι η

κωδικοποίηση πολλαπλών κώδικων QC-LDPC με διαφορετικά ή ίδια μήκη λέξεων

και ρυθμό κώδικα, όπως στην πείπτωση των κώδικων QC-LDPC του IEEE 802.11n/ac

(WiFi), οι οποίοι αποτελούνται από 12 κώδικες (με τρία μήκη λέξεων, το καθένα

με τέσσερις ρυθμούς κώδικα). Προτείνεται μία παράλληλη αρχιτεκτονική με δέντρο

XOR, η οποία εκμεταλλεύεται την δομή των πινάκων βάσεων των 12 κωδίκων με

σκοπό την υλοποίηση μιάς αρχιτεκτονικής χαμηλής πυκνότητας με ρυθμό δεδομένων

της τάξεως των πολλών Gbps. Χρησιμοποιείται ο ίδιος αλγόριθμος με τα παραπάνω,

πολλαπλασιασμός με HT1 και στη συνέχεια με (HT2 )−1. Η αρχικτεκτονική είναι βασι-

σμένη στην λογική των κοινών εκφράσεων και οι πύλες XOR που χρησιμοποιούνται

για τον πολλαπλασισμό με μία συγκεκεριμμένη στήλη ένος συγκεκριμμένου πίνακα,

χρησιμοποιούνται για τον πολλαπλασιασμό με μία άλλη στήλη ιδίου ή διαφορετικού

πίνακα. Η αρχιτεκτονική αυτή μπορεί να χρησιμοποιηθεί για την κωδικοποίηση κω-

δίκων QC-LDPC άλλων πρωτοκόλλων που έχουν παρόμοια δομή. Η προτεινόμενη

αυτή μέθοδος κωδικοποίησης περιγράφεται στο Κεφάλαιο 6.

Επίσης, προτείνεται μία δεύτερη προσέγγιση για την κωδικοποίηση κωδίκων QC-

LDPC με πολλαπλούς ρυθμούς κώδικα. Η μεθοδολογία που προτείνεται βασίζεται

στον αλγόριθμο κωδικοποίησης του MacKay και την δομή του πίνακα ελέγχου ισοτι-

μίας που προτείνει [88], τροποποιώντάς τους κατάλληλα για να υποστηρίζουν τους

κώδικες QC-LDPC αλλά και για να υποστηρίζουν την παραγωγή και κωδικοποίηση

κώδικες QC-LDPC με διαφορετικούς ρυθμούς κώδικα χρησιμοποιώντας ένα μόνο αρ-

List of Tables xxix

χικό κώδικα, και όχι πολλούς κώδικες όπως στην περίπτωση του IEEE 802.11n/ac

(WiFi). Η υποστήριξη των πολλαπλών ρυθμών κώδικα πραγματοποιείται με την α-

φαίρεση ζεύγων από γραμμές και στήλες του πίνακα ελέγχου ισοτιμίας, διατηρώντας

πάντα την αρχική δομή που προτείνει ο MacKay. Η μεθοδολογία αυτή εξασφαλίζει

χαμηλή πολυπλοκότητα κωδικοποίησης με ανταγωνιστική ικανότητα διόρθωσης λα-

ϑών του κώδικα σε σύγκριση με άλλες σχετικές μελέτες της βιβλιογραφίας, όπως

περιγράφεται αναλυτικά στο Κεφάλαιο 7.

Για την μέτρηση της πολυπλοκότητας των αρχιτεκτονικών που προτείνονται, αλ-

λά και για τον χαρακτηρισμό των κωδίκων QC-LDPC που κατασκευάζονται, όπως

περιγράφεται στη διατριβή, αναπτύχθηκε σε υλικό ένα μοντέλο ενός ψηφιακού τηλε-

πικοινωνιακού συστήματος που αποτελείται από κωδικοποιητή, κανάλι Γκαουσια-

νού ϑορύβου και τον αντίστοιχο αποκωδικοποιητή. Το σύστημα αυτό υλοποιήθηκε

σε διάφορες συσκευές FPGA, και χρησιμοποιείται για εξομοιώσεις και μέτρηση την

διορθωτική ικανότητα των κώδικων, ιδιαίτερα στη περιοχή του error-floor, σε μικρό

χρονικό διάστημα.

Chapter 1

Introduction

1.1 Overview

LDPC codes have been included in communication standards such us the 10GBASE-

T (IEEE 802.3an-2006) for connections over twisted pair cables, DVB-S2 (EN

302 307) for satellite transmission of digital television and in various wireless sce-

narios: mobile communication WiMAX (IEEE 802.16e), WLANs (IEEE 802.11a) and

others (IEEE 802.3an, IEEE 802.15.3c, IEEE 802.11n/ac). Furthermore, high-rate LDPC

codes have been selected as the channel coding scheme for mmWave WPAN (IEEE

802.15.3c).

Quasi-Cyclic (QC) LDPC codes are the class considered in this research, their con-

struction, performance evaluation and hardware implementation (in aspect of encoder)

are all jointly examined. Although it has been shown that Multi-level or (recursively-

constructed) QC-LDPC codes can be a strong candidate for real life application due

to their good error-corrective capability and efficient decoder implementation, their

encoder hardware implementation was neglected.

This thesis introduces low-complexity high-throughput encoders for efficient codes,

particularly for QC-LDPC codes with very low error floor. We investigate methods that

reduce the high encoding complexity caused by multiplications by inverse matrices, an

issue from which several encoding procedures suffer. The introduced techniques are

2 Introduction

shown to deliver codes that provide corrective capabilities that meet specifications of

particular interest with very low cost.

The main concept is that we jointly consider code construction, encoding method,

and architecture to mitigate the BER-performance -– encoder complexity trade-off.

First we start by reducing the complexity using a matrix decomposition to reduce

the computational complexity attributed to multiplications by dense inverse matrices.

Subsequently, we study the impact of shifting factors of the original matrix that affect

both BER performance and the density of matrix inverse. In sequel, we introduce a

high-speed area-efficient encoder architecture for several QC-LDPC codes, as the case

of WiFi codes, fully exploiting parallelism. Motivated by the complexity of a Wifi

encoder, we discuss rate-compatible QC-LDPC codes and introduce a technique to

derive area-efficient encoders. The hardware implementation results are shown to be

promising.

1.2 Thesis Contribution

This dissertation contributes to the area of digital signal processing and error correc-

tion systems. Specifically, it introduces novel methods and techniques to the field of

QC-LDPC code construction and low-complexity (low area complexity, low time com-

plexity, low power consumption) encoder VLSI architectures, and innovative solutions

that enable mobility quickly and cost-effectively.

The primary objectives of this dissertation are:

I- A construction of QC-LDPC codes of good error-correction capability, with im-

proved gain at the error floor.

II- Modifications of encoding algorithms to be suitable for QC-LDPC codes, based

on LU matrix decomposition, common sub-expression sharing, combined with

the support of rate-compatibility.

III- A modification of QC-LDPC codes of certain international standards that re-

1.3 Thesis Outline 3

duces their encoder hardware requirements and computational complexity, while

maintaining error-correction performance, based on the selection of certain QC-

LDPC parameters through a proposed technique.

IV- Efficient flexible encoding of LDPC codes of certain international standards..

1.3 Thesis Outline

The thesis is organized as follow.

Chapter 2 reviews the state of the art regarding channel coding and forward error

correction using Low-Density Parity-Check Codes. It introduces the prior encoding

techniques and discusses the decoding process

Chapter 3 introduces hardware architectures for encoding Quasi-Cyclic Low-Density

Parity-Check (QC-LDPC) codes. The described encoders are based on appropriate

factorization and subsequent compression of involved matrices by means of a novel

technique, which exploits features of recursively-constructed QC-LDPC codes. The

particular approach derives to linear encoding time complexity and requires a constant

number of clock cycles for the computation of parity bits for all the constructed codes

of various lengths that stem from a common base matrix. The proposed architectures

are shown to efficiently support flexibility for the QC-LDPC code family.

In Chapter 4, a new parity check matrix construction technique that simplifies the

hardware encoders for Multi-Level - Quasi-Cyclic (ML-QC) LDPC codes is presented.

The proposed construction method is based on semi-random - ML-QC extension and

appropriately selects shifting factors to reduce the density of the inverted matrix used

in several encoding algorithms. The construction method derives low-complexity en-

coders with minimal degradation of error-correction performance, observable at low

BER only. Furthermore a VLSI encoding architecture based on the suggested parity-

4 Introduction

check matrix (PCM) is also introduced. Experimental results show that the complex-

ity of the proposed encoders depends on the density of the binary base matrix. A

comparison with random QC codes reveals substantial complexity reduction without

performance degradation for cases of practical interest. In fact a hardware complexity

reduction by a factor of 7.5 is achieved, combined with the acceleration of the encoder,

for certain cases. The construction method can be applied on both QC-LDPC codes

and ML-QC LDPC codes.

Chapter 5 discusses a generalized parity check matrix construction method that de-

rives to same results with that ofChapter 4. The proposed construction method is

based on a constraint selection of shifting factors, shown to reduce the density of

an inverted matrix used in several encoding algorithms. The complexity of encoding

schemes involving inverted matrices is largely affected by their density. Comparisons

of the proposed parity check matrices with codes employed in international standards

and with random QC-LDPC codes of comparable characteristics, demonstrate the low

complexity of the corresponding hardware implementations and a BER performance

equivalent to that of previous reported codes without increasing the decoding com-

plexity.

Chapter 6 describes an encoding method and a VLSI architecture suitable for en-

coding of multiple different codes with different code rates and codeword lengths. The

portions of the hardware resources that are configured to compute the parity bits for

a particular one of the codes are commonly shared with portions of the hardware

resources that are configured to compute the parity bits for another particular one of

the codes. The encoding method, further comprising determining an arrangement for

the hardware resources using common sub-expression solving techniques. The encoder

perform padding in which bits are added to the message bits prior to computing the

parity check bits. Also, it performs puncturing and repeating after computing the par-

ity check bits. The padding, puncturing and repeating can be, for example, performed

in accordance with IEEE 802.11n/ac standards. The encoder can be implemented in

1.3 Thesis Outline 5

hardware using field programmable gate arrays (FPGAs), application specific integrated

circuits (ASICs) or other types of circuitry. Due to the implementation of multi-error

correction codes using sub-expression sharing techniques and shared portions, area

and time complexity of the particular encoder are shown to be very low compared to

other related works, thus supporting multi-Gbps applications with very low area cost.

Chapter 7 considers the hardware design of rate-compatible forward-error-correction

systems based on QC-LDPC codes. Specifically, a modification of the MacKay encod-

ing scheme is proposed that allows the support of QC codes. Furthermore a matrix

puncturing scheme is introduced that exploits an also proposed parity-check matrix

construction technique, and achieves code rate compatibility with low hardware com-

plexity. By extending the linear-complexity MacKay encoding algorithm, the proposed

encoding scheme derives a low-complexity hardware encoder architecture. Further-

more the proposed matrix puncturing scheme avoids the noise introduced by con-

ventional puncturing and therefore improves BER performance, especially for high

puncturing rates, where only a few parity check symbols are transmitted. The impact

on decoder hardware is also studied. A comparison with prior art in puncturing is

offered, which shows superior performance of the proposed scheme in terms of coding

gain with almost negligible hardware cost.

Chapter 8 summarizes the thesis and presents the drown conclusions.

Chapter 2

Review of the state of the art

T fast-growing use of digital networks has led to the need for the design of new

communication networks with higher capacity. The telecommunication industry

is also changing, with a demand for a greater range of services, such as video confer-

ences, or applications with multimedia content. The increased reliance on computer

networking and the Internet has resulted in a wider demand for fast and efficient

connectivity to be provided to any location, leading to a rise in the requirements for

higher capacity and high reliability broadband wireless telecommunication systems.

Broadband availability brings high performance connectivity to over a billion users

worldwide, thus developing new wireless broadband standards and technologies that

will rapidly span wireless coverage. Moreover, the huge uptake rate of mobile phone

technology, WLAN (Wireless Local Area Network) and the exponential growth of

Internet have resulted in an increased demand for new methods of obtaining high

efficient wireless communication systems.

2.1 Wireless communications and channel capacity

Wireless communications supporting data exchange between people or devices is the

communications frontier of the next century. Wireless networks will be used to con-

nect together smartphones, tablets, laptop, and desktop computers anywhere – any

8 Review of the state of the art

time; this growing demand for wireless communication makes it important to de-

termine the capacity limits of wireless channels. These capacity limits dictate the

maximum data rates that can be achieved without any constraints on delay or com-

plexity of the encoder and decoder, at the transmission and receiving sides, respec-

tively. Channel capacity is the tightest upper bound on the rate of information that

can be reliably transmitted over a communications channel. By the noisy-channel

coding theorem, the channel capacity of a given channel is the limiting information

rate (in units of information per unit time) that can be achieved with arbitrarily small

error probability [121]. Information theory, based on the works of Claude E. Shannon,

defines the notion of channel capacity and provides a mathematical model by which

one can compute it [121, 122]. The key result states that the capacity of the channel,

as defined above, is given by the maximum of the mutual information between the

input and output of the channel, where the maximization is with respect to the input

distribution. The Shannon theorem states that given a noisy channel with channel

capacity C and information transmitted at a rate R, then if R < C there exist codes

that allow the probability of error at the receiver to be made arbitrarily small. This

means that, theoretically, it is possible to transmit information nearly without error

at any rate below a limiting rate, C.

2.2 Channel coding

Simple schemes such as "send the message three times and use a best two out of three

voting scheme if the copies differ" are inefficient error-correction methods, unable to

asymptotically guarantee that a block of data can be communicated free of error.

Advanced techniques such as Reed–Solomon codes and, more recently, Low-Density

Parity-Check (LDPC) codes and turbo codes, come much closer to reaching the the-

oretical Shannon limit, but at a cost of high computational complexity. Using these

highly efficient codes, it is possible to reach very close to the Shannon limit. In fact,

it has been shown that LDPC codes can reach within 0.0045 dB of the Shannon limit

2.2 Channel coding 9

(for very long block lengths) [28].

In telecommunication, information theory, and coding theory, forward error cor-

rection (FEC) or channel coding is a technique used for controlling errors in data

transmission over unreliable or noisy communication channels. The central idea is

the transmitter encodes its message in a redundant way by using an error-correcting

code (ECC). The redundancy allows the receiver to detect a limited number of errors

that may occur anywhere in the message, and often to correct these errors without

retransmission. FEC gives the receiver the ability to correct errors without needing a

reverse channel to request retransmission of data, but at the cost of a fixed, higher

forward channel bandwidth. FEC is therefore applied in situations where retrans-

missions are costly or impossible, such as one-way communication links and when

transmitting to multiple receivers in multicast. FEC information is usually added

to mass storage devices to enable recovery of corrupted data, and is widely used in

modems.

2.2.1 Role of coding

The main reason to apply error correction coding in a telecommunication system is to

reduce the probability of bit error. A coding scheme can be evaluated in many different

ways from purely mathematical [139] to more implementation-oriented methods [114].

Throughout this thesis the codes are evaluated using a performance prospective and

hardware complexity. The performance of the code is measured as the probability of

the decoder to select for the wrong codeword at a given level of noise. The level of

noise is quantified by the signal-to-noise-ratio (SNR) often expressed in decibel (dB).

The probability of having an error after the decision can be defined by the bit error

rate (BER). The SNR is the ratio of the energy per bit generated by the source Eb

over the noise spectral density N0. The bit error rate is the ratio of the number of

erroneous decoded bits to the total number of bits transmitted. The block error rate is

the ratio of the number of decoded messages that contain errors over the total number

of codewords transmitted.


For many codes, the error correction capability of a code does not come for free.

This performance enhancement is paid for by increased complexity and, for block

codes, convolutional codes, turbo codes, and LDPC, by either a decreased data rate

or increase in signal bandwidth.

2.2.2 Linear block codes

Block coding is a type of error correction coding in which the digital data to be

transmitted is broken into messages of fixed size. Prior to transmission, each message

is encoded into a codeword (also referred to as a "block") by an encoder. Redun-

dancy, referred to as parity data, is inserted during the encoding process so that the

codewords are made larger than the messages. Each codeword includes both message

bits and parity bits. Assume that the codewords each consist of n bits. Only certain

patterns of n bits are valid codewords; the remaining patterns are invalid. The code-

words are then transmitted, which may cause the codewords to become corrupted.

Upon reception, a decoder attempts to infer the original messages from the received,

and possibly corrupted, codewords.

A binary block code generates a block of n coded bits from k information bits.

We call this an (n, k) binary block code. The coded bits are also called codeword

symbols. The rate of the code is Rc = k/n information bits per codeword symbol. If

we assume that codeword symbols are transmitted across the channel at a rate of

Rs symbols/second, then the information rate associated with an (n, k) block code is

Rb = RcRs =knRs bits/second. Thus we see that block coding reduces the data rate

compared to what we obtain with uncoded modulation by the code rate Rc. A block

code is called a linear code when the mapping of the k information bits to the n

codeword symbols is a linear mapping.

2.3 Low Density Parity Check Codes 11

2.3 Low Density Parity Check Codes

LDPC codes were originally invented by Gallager in his 1961 Masters thesis [41].

However, these codes were largely ignored until the introduction of turbo codes.

Subsequent to the landmark paper on turbo codes in 1993 [13], LDPC codes were

reinvented by Mackay and Neil [87] and by Wiberg [136] in 1996. Shortly thereafter it

was recognized that these new code designs were actually reinventions of Gallager’s

original work, and subsequently much work has been devoted to finding the capacity

limits, encoder and decoder designs, and practical implementation of LDPC codes for

different channels.

LDPC codes are linear block codes with a particular structure for the parity check

matrix H. Specifically, a (dv, dc) regular binary LDPC code has a parity check matrix

H with dv ones in each column and dc ones in each row, where dv and dc are chosen

as part of the codeword design and are small relative to the codeword length. Since

the fraction of nonzero entries in H is small, the parity check matrix for the code has

a low density, and hence the name low-density-parity-check codes.

Provided that the codeword length is long, LDPC codes achieve performance close

to the Shannon limit, in some cases surpassing the performance of parallel or serially

concatenated codes [118]. The fundamental practical difference between turbo codes

and LDPC codes is that turbo codes tend to have low encoding complexity (linear in

blocklength) but high decoding complexity (due to their iterative nature and message

passing). In contrast, LDPC codes tend to have relatively high encoding complex-

ity (quadratic in blocklength) but low decoding complexity. In particular, like turbo

codes, LDPC decoding uses iterative techniques, which are related to Pearl’s belief

propagation commonly used by the artificial intelligence community [95]. However,

the belief propagation corresponding to LDPC decoding is simpler than for turbo de-

coding, thereby making the LDPC iterative decoder much simpler [76, 95]. In addition,

the belief propagation decoding is parallelizable and can be closely approximated with

very low complexity decoders [116]. Finally, a decoding algorithm for LDPC codes can

detect when a correct codeword has been reached, which is not necessarily the case


for turbo codes. Additional work in the area of LDPC codes includes finding capacity

limits for these codes [116], determining effective code designs [28] and efficient en-

coding and decoding algorithms [38, 116], and expanding the code designs to include

nonregular [118] and nonbinary LDPC codes [30].

Construction of efficient LDPC codes, in terms of error-correction capability and

hardware implementation, will always be a hot research topic since they are employed

in many telecommunication applications. There are many construction parameter such

as the codeword length, degree distribution, the girth and the minimum distance of

the code, and trapping and stopping sets [127] [79] [15] [100]. Furthermore, detailed

studies on construction of low-density parity-check convolutional (LDPCC)codes have

been proposed in [63] [102] aiming to improve the error-correction capability of those

codes, an they have been shown to be capable of achieving capacity-approaching

performance with iterative message-passing decoding [115].

Protograph LDPC codes are shown to provide efficient BER performance [109]. A

protograph contains a small number of variable and check nodes that are intercon-

nected via edges where parallel edges are allowed in a protograph. A protograph code

is an LDPC code built from the protograph via lifting, which is the process of copying

the protograph repeatedly and permuting the edges corresponding the same node type

across different copies to interconnect them [33]. Mitchell et al. have proposed an

analytical method of constructing QC-LDPC codes based on pre-lifted protographs,

investigating various parameter directly relevant to the decoding performance of the

constructed codes, such as minimal distance and girth [101]. One of the successful

examples of protographs is the AR4JA codes of [33], providing efficient BER perfor-

mance in the waterfall and the error-floor regions, and the structure of these codes

consists a strong base for constructing very BER efficient codes [130] [109].

The introduction of FEC schemes increases the complexity of the transmitter and

receiver; in particular the transmitting module has to determine the redundant bits

(parity bits) applying algebraic operations over matrices of different sizes, while the

receiving equipment has to take care of correcting the possible errors, tasks that can be


computational demanding. For this reason, when real-world systems are designed, the

choice of the code used cannot be based exclusively on the coding performance but also

hardware requirements must be considered. Hence, it is important to develop codes

that are capable of good performances without making the encoding and decoding

processes intractable from an implementation point of view. For this reasons a class

of LDPC codes called Quasi-Cyclic (QC) LDPC codes is considered. QC-LDPC codes

have been proposed to reduce the implementation complexity, while obtaining a similar

performance [16, 39]. The QC-LDPC codes consist of concatenated circulant sub-

matrices. Each circulant sub-matrix is a square matrix for which every row is the

cyclic shift of the previous row, and the first row is obtained by the cyclic shift of

the last row. In this way, every column of each circulant sub-matrix is automatically

the cyclic shift of the previous column, and the first column is obtained by the cyclic

shift of the last column. QC-LDPC codes are codes that approach optimal decoding

performances with an acceptable decoding computational cost. However, there are

physical limits preventing the implementation of large block size LDPC codes. These

limits are associated with the interconnects of the various modules of the decoder

and the large memory that the algorithm requires. Furthermore, LDPC/QC-LDPC

codes have been included in communication standards such us the 10GBASE-T (IEEE

802.3an-2006) [5] for connections over twisted pair cables, DVB-S2 (EN 302 307) [3]

for satellite transmission of digital television, WiMAX (IEEE 802.16e) [4], WLAN’s

(IEEE 802.11a) [2]. Furthermore, high-rate LDPC codes have been selected as the

channel coding scheme for mmWave WPAN (IEEE 802.15.3c) [6], and recently for the

60 GHz gigbit link [69]

2.3.1 Tanner graph

LDPC codes can be represented by a bipartite graph, called the Tanner graph [127]. It

provides a complete representation of the code and an aid in the description of the

decoding algorithm. A bipartite graph is a graph whose nodes can be separated in

two types, called check and variables nodes in the case of LDPC codes. The edges


of the graph connect only two nodes of different types. The bipartite graph for an

LDPC code can be derived from the H matrix by generating as many variable nodes

vi as the columns of the matrix and as many check nodes c j as the rows of the

matrix. A variable node vi is connected to a check node c j if and only if there is a

nonzero element in h ji. The Tanner graph also offers a base to study various aspects

that influence the performances of the LDPC codes. A cycle in Tanner graph occurs

when a path that starts from a node ni ends at the same node ni, while the girth of

the graph is the length of the smallest cycle in that graph. The girth is considered

one of the important parameters of an LDPC code [86], it is commonly accepted that

the presence of short cycles in the graph is one of the main parameters affecting the

coding gain achievable by the LDPC code [61, 70, 74, 98, 134, 135].

2.3.2 The encoding process

The process of transmitting digital data can introduce errors into the data. As a result,

the received data can be different from the transmitted data. Such errors are typically

caused by noise that is present in the transmission channel. The amount of errors is

generally related to the transmission signal strength in relation to the amount of noise

present. Error correction coding is a technique by which redundancy is inserted into

the data prior to transmission. Upon reception, this redundancy is used in an attempt

to correct errors that were introduced during the transmission process. A generator

matrix can be used during the encoding process to encode the messages into valid

codewords. Upon reception, a parity check matrix can be used during the decoding

process to generate an error vector, where the error vector indicates the presence of

errors in the received codeword [104, 105].

The generator matrix for LDPC codes is generally not sparse. This means that

the encoding process for an LDPC code can have high complexity. In an effort to

reduce encoding complexity, some encoding schemes use the parity check matrix to

compute the codewords during the encoding process. This is possible because the

parity check matrix is related to the generator matrix in that the parity check matrix


for each particular LDPC code can be derived from the generator matrix for that

code. The parity check matrix can be partitioned into sub-matrices. The parity bits for

each codeword can be computed from the message bits using the sub-matrices [75, 117].

Some LDPC encoders employ forward/backward substitution [51, 123]. This approach is

used to avoid inversion of the parity check sub-matrix in an effort to reduce complexity

of the encoding computations. However, parallelization of the backward substitution

procedure introduces high complexity. Also, to implement the backward substitution

procedure for LDPC codes having different of block lengths and code rates, at least

the non-zero elements for multiple sub-matrices need to be stored (i.e. one per code

length, per code rate), which requires large memories. In addition to the storage

requirements, implementation of these procedures tends to require complex hardware.

Many hardware architectures have been suggested for the implementation of flexi-

ble high-speed LDPC decoders [52, 78, 94, 106]. By imposing certain restrictions in the

structure of the parity check matrix (PCM), further hardware simplifications become

possible. Following this approach, a special class of LDPC codes, called QC-LDPC

codes, allows for efficient hardware implementations of encoding and decoding algo-

rithms by exploiting the structure of the corresponding PCM, which is composed of

circulant permutation matrices [24][126].

Dense-matrix operations substantially increase the complexity of an LDPC encoder.

Several approaches have been studied in the literature to reduce the particular com-

plexity. Several authors have proposed encoders that directly multiply the information

word with the generator matrix, or with the part of it which refers to the parity bits,

in the case of systematic codes [82, 141]. Li et al. describe two techniques for the

derivation of the generator matrix of a circulant PCM using either a single-step or a

two-step procedure leading to single-step and two-step encoder architectures respec-

tively [82]. The derivation of the generator matrix requires operations with the inverse

of a submatrix of the PCM [82]. While submatrices of an LDPC PCM are necessar-

ily sparse, their inverses can be dense. Dense-matrix operations can be avoided by

exploiting certain properties that stem from a particular PCM structure. For example


Kopparthi and Gruenbacher avoid the computation of a generator matrix by solving

a suitably defined linear system of equations to compute the parity bits by resorting

to forward substitution [75]. Their approach exploits the doubly diagonal structure of

the IEEE 802.16e PCM submatrix which corresponds to the parity bits.

Prominent among the encoder architectures are those based on the Richardson-

Urbanke (RU) algorithm [117]. Richardson and Urbanke bring an LDPC PCM into an

Almost Lower Triangular (ALT) format by executing row and column permutations

thus maintaining the sparsity of the PCM. The derived ALT PCM is partitioned into

six submatrices that participate in a series of operations to derive the parity bits. The

complexity of the method is O(n+g2), where g is shown to be bounded by√

n and is

small for several good codes. Lee et al. [77] implement the RU algorithm in an FPGA.

Multiplication by a small dense matrix is replaced by the solution of a system using

forward substitution. An issue with the RU technique is that column permutations

may be needed to guarantee the non-singularity of a particular submatrix whose

inverse is required in the computation, thus resulting in a necessarily non-systematic

encoder. To address this issue, Cohen and Parhi propose a hybrid encoder, which

uses operations with the generator matrix to derive a subset of the parity bits, while

the remainder of the parity bits are computed using the RU method [29]. Fewer et

al. also implement the RU algorithm with partly parallel architecture and special

matrix construction to increase encoding throughput [36]. Sun et al. exploit the dual

diagonal structure of H2 submatrix of IEEE 802.11n standard to determine the parity

check symbols using forward substitution [124].

Forward or backward substitution to solve a system and therefore avoid a corre-

sponding dense matrix operation is commonly used in several encoding techniques,

including the RU method and the direct generator-based techniques. Specifically, mul-

tiplication by inverses of sparse matrices, which may be dense, can by avoided by

performing suitable forward/backward substitution, in the case that the corresponding

multipliers are upper/lower triangular. He et al. exploit a special lower triangular

structure of the PCM to define an architecture that performs a two-step encoding pro-


cedure, based on the hardwired implementation of multiplication by small matrices

and of forward substitution [51]. Furthermore, most encoder architectures can ben-

efit from a PCM composed of circulants. In addition high throughput rates can be

achieved by utilizing several instances of the encoders in parallel.

Recently, a different approach to encoder complexity reduction has attracted some

interest, namely the use of LU decomposition of the dense matrices into matrices of

lower density. Several authors have investigated this approach, focusing mostly on

the codes employed in China Multimedia Mobile Broadcasting (CMMB) [56, 123, 133].

Such codes are highly structured and circulant but not QC. Also the application of

LU decomposition has been studied for Irregular Repeat-Accumulate (IRA) codes [68].

Research efforts toward this direction target the derivation of sparse L and U matri-

ces [56].

Neal introduced an encoding method with linear complexity, where LU decompo-

sition is exploited to avoid the computational cost of multiplication by a dense inverse

matrix [108]. This method is followed by Su et al. where LU decomposition is applied

on H2 and forward/backward substitution is used to compute the parity bits [123].

Their design takes 20% of the memory available in an Altera Stratix EP1S80B596C

device codeword with length 1536 and code rate 1/2, achieving throughput of 31 Mbps.

They use a four-step procedure to compute the parity check vector which means that

it would be difficult to parallelize their encoder architecture using their index calcu-

lator. Kaji studied the LU decomposition of H2 of PCMs of non-QC LDPC codes in

comparison with the RU algorithm [68]. He has shown that LU decomposition of H2

leads to less encoding complexity than the RU algorithm; however, his analysis does

not deal with the structure of L−1 and U−1 in the case of QC-LDPC codes, where they

could have less nonzero sub-matrices than (HT2 )−1. As these sub-matrices themselves

could have different structure and density with regar

LDPC Codes: Code Construction and Encoder Hardware ... · occasions. Many thanks to my friends and...

Documents

Transcript of LDPC Codes: Code Construction and Encoder Hardware ... · occasions. Many thanks to my friends and...