Information
Theory
- Capacity of a channel
- It is also called "information
capacity". It takes units of "bits per
transmission", or "bits per second", or
"bits/s/Hz".
- It is different from another capacity
called connection-carrying capacity, used in connection admission
control.
- Operational definition: max achievable
rate R for (2^{nR}, n) code with peak error probability approaching to
zero as n goes to infinity.
- Information definition: max I(X;Y), where
the maximization is over the distribution p(X).
- Theorem: max achievable rate = max I(X;Y)
- Capacity of a discrete channel, in unit
of "bits per transmission".
- C = max I(X;Y), where X and Y are
input/output discrete random variables; I(X;Y) is mutual
information; and the maximization is over the Probability Mass
Function (PMF) of the input X.
- Binary Symmetric Channel (BSC):
C = 1 - H(p)
- Binary Erasure Channel (BEC):
C = 1 - p
- Capacity of a continuous channel, in unit
of "bits per transmission".
- C = max I(X;Y), where X and Y are
input/output continuous random variables, and the maximization is
over the Probability Density Function (PDF) of the input X.
- Additive White Gaussian Noise (AWGN)
channel (continuous-valued discrete-time channel)
- C = 1/2 *log(1+SNR) for
real-valued input
- C = log(1+SNR) for complex-valued
input
- Capacity of a band-limited AWGN channel
(continuous-valued continuous-time channel or waveform channel), in unit
of "bits per second".
- Shannon's information capacity
theorem (for real-valued waveform channel with AWGN):
- C = B*log(1+SNR), where SNR=P/(B*N0).
- Capacity of a discrete-time
discrete-valued channel with memory (refer to Gallager's book, pp. 97),
or Finite state channel (FSC).
- Capacity of a fading channel
- Ergodic capacity for a discrete-time
i.i.d. fading channel, in unit of "bits per
transmission".
- Outage capacity
- Rate distortion function
- Operational definition: min achievable rate R
for (2^{nR}, n) code with expected distortion <=D as n goes to infinity.
- Information definition: min I(X;hat{X}) s.t.
expected distortion <=D,
where the minimization is over the distribution p(\hat{X}|X).
- Computational information theory:
- Blahut's iterative algorithm to calculate
channel capacity
- Blahut's iterative algorithm to calculate
rate distortion function
- It is similar to Lloyd algorithm.
- Obtain KKT condition for min I(X;hat{X}) s.t.
expected distortion <=D.
- Decompose the KKT condition into two
optimal conditions.
- Use the two optimal conditions to design two
calculation steps in each iteration.
- Prove the iterative algorithm converges to
the optimal solution.
- Proof of convergence of the two Blahut
algorithms
- Inequalities:
- Jensen Inequality:
- For convex function f(x), f(E(x))<=E(f(x)).
- For concave function f(x), f(E(x))>=E(f(x)).
- Almost all inequalities related to
expectation (e.g., Holder inequality, Cauchy-Schwarz inequality) can
be proved by Jensen Inequality.
- Holder inequality:
- Cauchy-Schwarz inequality:
- Mincowski inequality:
- Lyapounov inequality:
- Kraft inequality:
- Intuition: the sum of probabilities of
disjoint sets in a sample space should not be greater than 1.
\sum_{x \in X} D^{-l(x)}<=1, where x is a letter, X is an alphabet, D
denotes the number of bits in a coded symbol, l(x) is the length of the
code corresponding to x.
- Fano inequality:
- exp(x)>1+x, for x>0. (Use Taylor
expansion to prove.)
- exp(-x)>1-x, for x<0.
- ln(1+x)<x, for x>0. (Let f(x)=ln(1+x)-x.
Take the derivative and get the proof.) Draw a figure to see
this. f(x)=x increases faster than g(x)=ln(1+x).
- ln(1+x)<x, for -1<x<0. (Note the
sign changes when taking the derivative.)
- ln(1-x)<-x, for 0<x<1.
- ln(1+K)<= \sum_{k=1}^K 1/k <=
1+ln(K).
- MDL theory for Universal coding: looking at
the mathematical property (stochastic complexity, parametric complexity,
optimal performance bound, etc.), instead of actual encoding algorithm.
- Entropy has three properties:
- Non-negativity
- Monotonic increase with the number of
equally-distributed elementary events
- Additivity
- The essense of the additivity of entropy is
that partitioning increases entropy. Partitioning is like branching in a
tree.
Key Techniques:
-
Asymptotic Equipartition Property (AEP): Basically
it stems from ergodic theory and ergodic theorems such as Weak Law of Large Numbers. Use
epsilon-N argument. Use the concept of typical sequence. AEP
is a useful tool to prove theorems in information theory.
- Strong AEP
- Weak AEP
- AEP for iid processes
- AEP for ergodic processes (Shannon-McMillan-Breiman
Theorem)
- Strong Law of Large Numbers is not applicable
(as for iid processes) since the stochastic sequence p(X_i|X_0^{i-1}) is not
ergodic. Use the sandwich method (upper/low bound) to prove it.
-
Random coding argument: Generate a code book
randomly. Due to the symmetric property of codewords, there is no need to
compute the symbol error probability for each codeword. Instead, only
considering one codeword is sufficient to obtain the symbol error probability.
-
Capacity for channels with additive white Gaussian
noise:
- c=log(1+SNR), which is a concave
function. For Low SNR, c is approximately equal to SNR.
The approximation error gets larger as SNR gets larger. Note c<SNR
for any SNR.
- Instantaneous channel capacity: c(t)=log(1+SNR(t)),
where SNR(t) is the instantaneous SNR at time t, for flat fading channel.
- Instantaneous channel capacity density: c(t,f)=log(1+SNR(t,f)),
where SNR(t,f) is the SNR for frequency f at time t, for frequency selective
fading channel.
Information theory vs. queueing theory
- The problem setting for Information theory is
that 1) there is no queue in the system, and 2) the communication channel is
noisy or error prone. In contrast, the problem setting for
queueing theory is that 1) there is a queue in the system, and 2) the
communication channel is noise-free or error free.
- The problem, with which Information theory is
concerned, is the ultimate bound for source coding and channel coding.
In contrast, the problem that queueing theory addresses is queueing delay
distribution or buffer overflow probability (or loss probability).
- Constant arrival rate and constant departure
rate make queueing theory not so useful; e.g., in constant bit rate wireless
voice communication.
- sharing
- point-to-point communication; topology