Content

Channel Coding

Classification of codes
- Block codes
  - Linear block codes: (n, k, d_min)
    - Hamming codes
    - Cyclic codes
      - Reed-Solomon codes
      - BCH codes
      - Cyclic Redundancy Check (CRC) codes
  - Nonlinear block codes
- Convolutional codes: (n, k, m)
- Compound codes
  - Concatenated codes (outer code + inner code)
  - Product codes
  - Punctured convolutional codes
    - Rate compatible punctured convolutional (RCPC) codes
  - Turbo codes
  - Low-density parity-check (LDPC) codes
Key metrics/factors that determine the performance of channel codes:
- Linear block codes: 1) minimum Hamming distance d_min or minimum Hamming weight, 2) weight distribution, 3) code rate k/n.
- Convolutional codes: 1) minimum free distance d_free, which is the minimum Hamming distance between pairs of code sequences, 2) weight distribution: transfer function of distances or weight enumerating function (WEF) (like z-transform, moment generating function), 3) code rate k/n.
- Coded modulation: the free Euclidean distance, which is the minimum Euclidean distance between pairs of coded signals (not codewords). Note that coded signals are waveforms while codewords are discrete-time discrete-valued. The set of coded signals and the set of codewords have one-to-one correspondence.
- Turbo codes: d_free, distribution of distances, code rate.
MAP detector/decoder minimizes average error probability (see Proakis's book "Digital Communications" p. 249, for proof). If transmitted symbols/signals are equally probable a priori, ML decoder minimizes average error probability.
Decoding algorithm
- Block codes
  - Hard decoding:
    - For linear block codes
      - Syndrome decoding (compute syndrome using parity check matrix, table lookup to determine error pattern). Disadvantage of syndrome decoding is high storage complexity of lookup table for codes with large d_min.
      - ML, MAP. Different from ML Sequence Estimation (Viterbi algorithm), a block codeword is treated as a vector rather than a sequence. ML decoding reduces to nearest neighbor decoding (minimum distance decoding) if AWGN channel is assumed.
    - For cyclic codes, use PGZ, Euclide algorithm.
  - Soft decoding for Low Density Parity Check (LDPC) code: ML, MAP. Output is estimated prior probability.
- Convolutional codes
  - Hard decoding:
    - MLSE (Viterbi algorithm)
    - MAP (symbol-by-symbol): recursive algorithm.
  - Soft decoding for Turbo code: Output is estimated prior probability. Turbo decoding is also called iterative decoding, which is trellis-based decoding.
    - Sequence estimation: SOVA, improved SOVA. The output sequence is a path in the trellis diagram. Minimize the FER (frame error rate) or WER (word error rate).
    - Symbol-by-symbol estimation: MAP (BCJR algorithm), max-log-MAP, Log-MAP. The output sequence is not a path in the trellis diagram. Minimize the BER (bit error rate).
The error-correction performance of block codes is determined by minimum code weight d_min (d_free); the error-correction performance of convolutional codes is determined by code weight distribution rather than minimum weight d_min (d_free). In addition, the error-correction performance can also be characterized by coding gain, say, 4 dB. Then for a specific modulation, we have BER= Q(sqrt(Gamma_code*Eb/N0)).
Coded Modulation
- Objective: achieve high coding gain without losing bandwidth efficiency. Traditionally, channel coding and modulation are separated jobs. Powerful channel codes have high coding gains but at the cost of increased bandwidth because more redundancy is typically induced. Multilevel modulations such as 256 QAM, achieve high bandwidth efficiency but at the cost of high signal power (assuming fixed noise power) or high SNR.
- Principle:
  - assign high/low Hamming distance for pair of signals with low/high Euclidean distance.
  - In other words, maximize/increase the minimum Euclidean distance between pairs of coded signals.
- Procedure for trellis coded modulation:
  - Set partitioning: progressively partition the constellation into groups of subsets based on the Euclidean distance; each level of partition results in a group of subsets, which have a fixed within-subset Euclidean distance.
  - Convolutional encoding: introduce redundancy in the bit stream. Some bits may be uncoded.
  - Signal mapping: map the codeword to a signal point in the signal space (constellation), based on the principle that
What is the performance difference between block codes and convolutional codes?
- Probably block codes are suitable for a discrete memoryless channel or AWGN channel. Check the performance of BER or Word Error Rate (WER) vs. SNR.
- Convolutional codes may be more suitable for a discrete Markov channel or a fading channel. Because convolutional codes introduce correlations among bits. For non-recursive convolutional codes, the correlation is within constraint length, which is memory length plus one. For recursive convolutional codes, the correlation exists between any two bits of a codeword, which can have infinite length, theoretically. That is, the correlation function of a convolutional code can have infinite duration, resulting in time diversity. Since block codes have finite codeword length, all bits in a codeword may be corrupted during a deep fade.
Only random block codes with infinite large block size, achieve Shannon capacity (with arbitrarily small sequence-error probability). Turbo codes and LDPC codes can achieve the performance very close to Shannon capacity.
What is the relation between FIR/IIR and convolutional code encoder?
- Similar to the analysis of FIR/IIR, we can also use z-transform to analyze convolutional code encoder. The system function is a rational function.
What is the relation between a discrete-time Markov Chain and convolutional code encoder?
- A discrete-time Markov Chain (MC) can be represented by a trellis diagram. For finite horizon of n, a trellis diagram is an n-step transition diagram, that is, a trellis consists of n copies of the MC state space (over n time steps), where the connections between two neighboring state spaces are determined according to the one-step state transition diagram of MC. In other words, a trellis is simply an extension of the one-step state transition diagram of MC over the time horizon. So a trellis has two dimensions: one is time horizon; the other is discrete state space (along the vertical axis).
- More generally, a trellis diagram characterizes the behavior of a finite state machine (automata).
- A convolutional encoder has memory and is implemented as either FIR or IIR. So it is a finite state machine. The state sequences can be represented by paths in a trellis diagram. Since every convolutional codeword (output of encoder) has a corresponding state sequence that generate the codeword, we can label a path by a codeword.
- Trellis: one-to-one or non-catastrophic, flawed or semi-catastrophic, catastrophic.
- So the encoder has memory and every convolutional codeword has memory. Every convolutional codeword can be represented as a path in a trellis diagram.
- Recursive convolutional codes use IIR structures. Non-recursive convolutional codes use FIR structures.
- Some turbo codes use Recursive Systematic Convolutional (RSC) codes.
How to use spectral methods (z-transform, pole-zero placement) to analyze a finite field?
Perfect codes vs. Maximum Distance Separable (MDS) codes
- Codes meeting the Hamming bound (characterizing code efficiency) are called perfect codes. Perfect codes are most efficient.
- Codes meeting the Singleton bound (characterizing code correction capability) are called MDS codes. MDS codes achieve best performance in error correction.
Turbo code
- What causes error floor at high SNR? Small d_free. Why small d_free causes error floor? Why Turbo code works well at low SNR?
  Assume that BPSK is used. P_e<= \sum_{d=d_free}^{\infty} Q(\sqrt(2*d*\gamma_b*R_c))<= \sum_{d=d_free}^{\infty} 1/2 *exp(-d*\gamma_b*R_c), where \gamma_b=E_b/N_0, R_c is code rate.
  - As \gamma_b goes to infinity, the upper bound is dominant by d_free. Since convolutional codes have larger d_free than turbo codes for the same code rate, convolutional codes have smaller P_e than turbo codes, at high SNR.
  - For small \gamma_b, the upper bound is determined by the distance/code-weight distribution. Since convolutional codes have much worse code-weight distribution than turbo codes for the same code rate, convolutional codes have much larger P_e than turbo codes, at low SNR.
- Why more than two parallel RSC encoders do not improve error performance? Simulations are not enough. Theoretical analysis is required.
- Develop nonlinear dynamic system theory to analyze Turbo code.
- BCJR algorithm was introduced in 1970's but why not till 1993 were Turbo codes introduced? This is because the joint design of Turbo encoder (two convolutional encoders and one interleaver) and Turbo decoder. The capacity-approaching performance achieved by Turbo codes is due to both the encoder and iterative decoder, not just because of the decoder (BCJR algorithm).
LDPC codes were invented in 1960's. But why not till 1990's were LDPC codes received a great deal of attention? This is because the decoding algorithm requires very high computing power but cannot be implemented with small cost in 1960's.
Types of LDPC codes
- Cyclic LDPC
  - Tanner's cyclic LDPC codes
  - Finite geometry
  - Steiner system
  - Disjoint Difference Sets (DDS) LDPC
- Turbo-structured LDPC (Jin Lu)
- Random LDPC (Gallager)
- Random LDPC (MacKay)
- Irregular LDPC (Spielman, Richarson)
- LDPC codes designed by bit-filling

Some open questions:

The goal is to design a code such that
- it achieves a decoding performance (average BER) close to Shannon bound
- it has low complexity in decoding (like R-S codes, only using shift register, instead of sum-product algorithm or SOVA)
- its decoding performance is analyzable (not having to resort to simulation, like R-S codes)
Cyclic LDPC codes can achieve the above three objectives at relatively high SNR; at low SNR, its decoding performance is poor. How to improve it?
How to design a turbo code such that its performance is analyzable like Reed-Solomon codes?
How to design a random LDPC code such that
- it has very large d_min and very good weight distribution, i.e., the distribution is heavily centered around its mean (most of the codewords have weights very close to the average weight).
  - hence, the code has good performance in both low SNR regime and high SNR regime.
- the design methodology is applicable for arbitrary code-length and arbitrary code-rate.
- the error performance of the code is analyzable (numerically solve equations to obtain the code-weight distribution; then the code-weight distribution results in a bound on the error performance of the code).

Turbo codes have a smaller d_min, compared with convolutional codes, hence turbo codes have error floor at high SNR while convolutional codes do not have. Turbo codes have good weight distribution, i.e., most codewords have large weights, compared with convolutional codes, hence turbo codes have capacity-approaching performance at low SNR while convolutional codes do not have.

Key Techniques:

Sphere packing: received signal vectors of different code-words belong to different non-overlapping sets, under the constraint that the number of errors/erasures is less than a threshold. As we know, if the received vectors fall in disjoint sets, the information bits can be correctly decoded. Sphere packing technique is used in the proof of many channel coding theorems.