Communication theory


CDMA [html]


Noise performance of communication systems



Modulation:


Digital Communications 

Key Techniques: 


Sampling and Interpolation 


At channel encoding/decoding layer, typically we use E_b/N_0, where E_b is energy per information bit and N_0 is single-sided noise power spectrum density.   The performance of a CODEC is characterized by the relation between BER and E_b/N_0.

At modulation/demodulation layer, typically we use E_s/N_0, where E_s is energy per symbol and N_0 is single-sided noise power spectrum density.   The performance of an MODEM is characterized by the relation between BER and E_s/N_0.  E_s/N_0=E_b/N_0 * bits/symbol * k/n, where k/n is the rate of the channel code.

The received SNR = E_s/N_0 * symbol_rate/signal_bandwidth = E_b/N_0 * bit_rate/signal_bandwidth.  If the transmitter uses a square-root raised cosine pulse shape and the receiver uses a matched filter, symbol_rate = signal_bandwidth.


How to convert a digital signal into a band-limited signal, so that it can be recovered without distortion after transmitted over a band-limited channel (rectangular spectrum)?

A method is to use a square-root raised cosine pulse shape at the transmitter and at the receiver (matched filter).  The benefits of this method are 1) satisfying Nyquist's criterion for zero ISI, and 2) maximizing SNR.


Viterbi Algorithm (VA) vs. Kalman filter (KF) 

Both of them use causality to reduce computational complexity.  The existence of causality leads to the design of a recursive algorithm so that the quantity (cost-to-arrive for VA or state estimate for KF) computed at the last stage can be used for obtaining the quantity at current stage.

there is no need to re-compute the total cost when new data comes in.

Both of them can be used for state estimation of hidden Markov model (HMM).  
 
Kalman filter has the following assumptions (linearity, Gaussianity, MMSE):
1) system state equation is known and linear; observation equation is known and linear.   System states must be continuous since the disturbance is Gaussian.   So the state can not be estimated by dynamic programming (DP).   DP is only for estimating the state of FSMC.
2) disturbances in the state equation are i.i.d. Gaussian; measurement errors in the observation equation are  i.i.d. Gaussian.
3) the distribution of initial state is known and Gaussian.
4) the (optimal) criterion for estimation is MMSE.
 
VA has the following assumptions:
1) system states must be discrete but cannot be continuous; the process of the system state forms a finite state Markov chain (FSMC).
2) the system state is partially observable (the observation is a random function of the state; the ranges of the observations corresponding to different states are overlapped; otherwise, from the observation, the state can be uniquely determined.)
3) We know the distribution of measurement errors, i.e., the probability of the observation, given the state transition.   Note the observation is related to the state transition (a function of two variables); the observation related to the single state is a special case.
4) The criterion of state sequence estimation could be ML or MAP.     
For MAP, we assume that the distribution of initial state is known but does not have to be Gaussian.   It can be any probability distribution.
We also need to know the transition probability of the Markov chain, i.e., the likelihood function or the disturbance's distribution, which doesn't need to be Gaussian.  
In digital communication, we typically assume transmitted symbols are equally probable and statistically independent.  Then MAP is reduced to ML. In addition, typically we assume the channel noise is AWGN.  So ML is reduced to minimum Euclidean distance between the transmitted symbol and the received signal.  It's also called nearest neighbor criterion.

What about channels having memory (ISI channels)?  A channel with memory is characterized by FSMC, where the number of states is determined by the channel memory length.   So VA can be used for ISI channels, rather than AWGN channels.   The assumption of using VA is that transmitted symbols are i.i.d. and the channel noise is white Gaussian.

What about transmitted symbols having memory (convolutional codes)?  Use VA to decode the codeword.

What about noise is colored? 1) whiten the noise using R^{-1/2}, where R is the noise covariance matrix. 2) time-domain process: model the noise U_n as AR(p) .   U_n - \sum_{i=1}^p a_i*U_{n-i} is i.i.d. Gaussian noise.   Then we can use VA w.r.t. whitened noise U_n - \sum_{i=1}^p a_i*U_{n-i}.


Sequential Bayesian estimation methods:

  1. Sequential Bayesian estimate of a random variable
  2. Kalman filter: sequential Bayesian estimate of the state of linear system (estimate an AR process)
  3. Particle filter: Sequential Importance Sampling, estimate the state of a nonlinear system or linear system with non-Gaussian disturbances.

BCJR vs. Viterbi Algorithm

The Bahl, Cocke, Jelinek, and Raviv (BCJR) algorithm is used for a posteriori probability (APP) decoding of convolutional codes in AWGN.   Although the BCJR and Viterbi algorithms share may similarities, they differ in a fundamental way because they compute very different quantities. The Viterbi algorithm computes hard decisions by performing maximum likelihood decoding. The BCJR algorithm computes soft information about the message in the form of a posteriori probabilities for each of the message bits. Of course, this soft information could be converted to hard decisions by comparing the a posteriori probabilities to a threshold of one half; the resulting decisions would be individually optimal in the sense that they have minimal probability of being in error. Thus, whereas the Viterbi algorithm produces the most likely message sequence, which minimizes the word-error rate, quantizing the BCJR probabilities produces the sequence of most likely message bits, which minimizes bit-error rate. The distinction is subtle and is not by itself enough to warrant much interest in BCJR. Indeed, the hard-decision performance of BCJR is only marginally better than Viterbi. The real value of BCJR is it that the a posteriori probabilities are valuable in their own right; they may be used to estimate the reliability of a decision, and are especially useful when interacting with other modules in a receiver. Another important feature of BCJR is that it naturally and easily
exploits any a priori information that the receiver may have about the message bits.

BCJR (or MAP algorithm) is an optimal decoding algorithm for Turbo codes.  Sub-optimal algorithms for decoding Turbo codes include log-MAP and SOVA. Converting (multiplication, summation) to (max, summation) can further reduce complexity.

Equivalence between the BCJR algorithm and Generalized VA (GVA)

BCJR algorithm is not a special EM algorithm.   BCJR computes the exact APP, so that we can obtain MAP estimate.  EM computes ML estimate.


Turbo principle: refine the processing recursively


Markov process vs. hidden Markov model (HMM): conditional independence property


Linear systems with perfectly observable state

Given a state equation and observation equation of a system,

X_{n+1} = A* X_n + V_n

Y_n = f(X_n)

where A is invertible square matrices, f is an invertible function, {V_n} are i.i.d. and V_n is independent of X_n, then both the states {X_n} and the observations {Y_n} are Markov chains; {Y_n} preserves the transition matrix of {X_n}, although the state space is changed.  

Linear systems with imperfectly observable state

If Y_n=f(X_n) + U_n, where {U_n} are i.i.d. and U_n is independent of X_n, then the observations {Y_n} is a Hidden Markov Model (HMM). Note that Y_n is a random function of X_n.   Note {U_n} must be i.i.d., otherwise, conditional independence required by an HMM cannot be satisfied: Pr{Y_n | X_0, X_1, X_n}=Pr{Y_n|X_n}.


Using Markov property (or Hidden Markov property) to reduce computation complexity

BCJR (more general, EM algorithm), Viterbi algorithm (more general, Dynamic programming), Kalman Filter, Particle filter, Markov Chain Monte Carlo (MCMC) algorithm, sum-product algorithm, all can utilize Markov property to reduce computation complexity.

If the EM algorithm is applied to HMM,  it can also utilizes Markov property  to reduce computation complexity.   But the EM algorithm does not necessarily have to be applied to HMM and it can also be used in other situations.   The key properties about EM are 1) iterative computation, 2) convergence, 3) rate of convergence, and 4) local/global optimality; not Markov property.

If Dynamic programming is applied to HMM,  it can also utilizes Markov property  to reduce computation complexity.   But Dynamic programming does not necessarily have to be applied to HMM and it can also be used in other situations.   The key properties about DP are 1) additive cost and 2) the principle of optimality; not Markov property.

Kalman filter reduces the computation complexity from O((N*n)^3) to O(N*n^3), where N is the number of samples and n is the dimension of the observation vector.

Quote: "Viterbi algorithm gives a global optimum while EM can only give a local optimum (actually a fixed point)."  This is not a fair comparison.  Viterbi algorithm is only used for estimating discrete r.v.'s rather than continuous r.v.   So the cost function (optimal criterion) cannot have a local optimum (the argument of the function is discrete).   Only continuous cost functions can have local optimums.   EM can be used for estimating  both discrete and continuous parameters.   However, for estimating discrete parameters \theta, it can not guarantee the convergence of {\theta_j} to a
stationary point of likelihood function p(X|\theta), although {\theta_j} converging to a stationary point of Q(\theta,\theta_j)  is guaranteed.  A sufficient condition for a limit point of an EM sequence {\theta_j} being a stationary point of the likelihood function p(X|\theta), is that Q(\theta,\theta_j) is continuous in both \theta and \theta_j.   Note that a stationary point could be either a local optimum or a saddle point.  


Discussion about the EM algorithm

In some situations (e.g., situations with missing data), the EM algorithm may be much simpler than direct maximum likelihood estimation (MLE).   Direct MLE may require exponential complexity while the EM algorithm may only incur linear complexity.

Given the complete data description p(X,Y|\theta), where X is known and Y is unknown, a direct MLE of the unknown parameter \theta could be:

In the following paper,

C.N. Georghiades and J.C. Han, Sequence Estimation in the Presence of Random Parameters Via the EM Algorithm, IEEE Transactions on Communications, vol. 45, pp. 300-308, March 1997.

Georghiades and Han gave an example, in which direct computation of MLE requires exponential complexity in the sequence length while the EM algorithm only requires linear complexity in the sequence length.  This motivates people to use the EM algorithm.


Complex Gaussian vector channel for MIMO systems

I. Emre Telatar, "Capacity of Multi-antenna Gaussian Channels," European Transactions on Telecommunications, Vol. 10, No. 6, pp. 585-595, Nov/Dec 1999.


Media-dependent noise

Alek Kavcic and J. M. F. Moura. "The Viterbi Algorithm and Markovian Noise Memory." IEEE Transactions on Information Theory, vol. 46, no. 1, pp. 291-301, Jan. 2000.

E.Eleftheriou and W.Hirt, "Noise-predictive maximum-likelihood (NPML) detection for the magnetic recording channel"

Shirish A.Altekar and Jack. Wolf, "Improvements in detectors based upon colored noise"


Space-time coding

Space-time coding (STC) has emerged over the past few years as a new paradigm for optimally combining modulation, coding, and diversity gains over wireless links. It is ideal for improving the downlink performance (which is the bottleneck in asymmetric applications such as Internet browsing and downloading) while keeping user terminals lightweight and cost effective.
STC was originally developed for frequency-flat quasi-static fading channels. Extensive recent research has focused on extending it to frequency selective and time-selective fading channels.
Detection of space-time-coded signals presents a significant signal processing challenge as multiple coded, distorted, and faded copies of the desired signal are superimposed at each receive antenna and corrupted by noise, multiuser interference, and synchronization errors. 

 
 
Signal processing techniques for space-time-coded transmission topics:
  • Channel Estimation (training-based, semi-blind, and blind)
• Equalization and Efficient Decoding
• Interference Suppression and Beamforming for STC
• STC for Time-Varying Channels (channel tracking)
• Synchronization Techniques
• STC and Multiple Access (OFDM, CDMA, TDMA)
• Turbo Techniques
• Capacity and Performance Limits
• High-Rate Layered Space-Time Methods (such as BLAST)
• Differential Space-Time Methods
• Prototyping Efforts