Content

Monte Carlo Methods

Three methods to compute an integral/expectation.
- integrate analytically, resulting in an exact solution.
- numerical algorithm (deterministic), resulting in an approximate solution.
- simulation-based method (random algorithm, Monte Carlo methods), resulting in an approximate solution.
Sampling in Monte Carlo methods means generating a realization of a random variable, which has a specified PDF/PMF. In other word, sampling means simulating a r.v.
- Sampling = observation.
- Missing data model, censored data model.
Population with distribution P = sample space X with P, i.e., {X, sigma field, P}.
Sampling PDF characterizes the channel model as in communication theory; it is also the same as transition probability, likelihood probability.
Gibbs sampler:
- Used to simulate a N-dimensional random vector (with N-dimensional PDF f(x_1, x_2,..., x_N)) by an iterative procedure.
- The basic idea is to reduce the simulation of N-dimensional random vector to simulations of univariate (marginal) conditional PDF's. That is, reduce high dimensional problem to one dimensional problem.

Markov chain Monte Carlo (MCMC) techniques:

In generating a random variant of high dimension (the pdf of which is assumed to be P(x)), MCMC methods are computationally efficient than inverse CDF method, acceptance-rejection method, and importance sampling. Intuitively, MCMC methods are like stirring an urn that contains many lottery numbered balls; each stir results in a configuration of the balls in the urn; given the n-th stir, the (n-1)-th stir and the (n+1)-th stir are conditionally independent; hence, the configurations with marginal pdf $P_n(x)$ form a Markov chain. As n approaches infinity, $P_n(x)$ converges to the target distribution P(x); the stirring is to explore the distribution P(x).
The key in MCMC is to choose the transition pdf (called transition kernel) and design efficient algorithms.
Particle filters (PF)
- Particle filter methods are a kind of sequential/recursive Bayesian estimation/filtering; they provide computation algorithms to calculate the posterior PDF; they are not optimal but are sub-optimal (approximation algorithms).
- The basic idea is sequential importance sampling (SIS) + resampling (to reduce degeneracy, where after a few iterations, all but one particle will have negligible weight).
- The merit of particle filters is that they are general enough to handle any nonlinear system model (state equation) and non-Gaussian noise model.
- The drawback is that they are general so that they do not utilize the specific structure of the nonlinear system model, which could potentially improve the performance. For example, if linearization is a good approximation of the nonlinearity of system dynamics and/or measurement equations, it may be better to use extended Kalman filter.
- Formal Bayesian inference treats any variable being inferred from the data as random while Maximum likelihood (ML) approach may take it as fixed but unknown parameter. But it really has little to do with what mother nature actually produces this kind of variable as deterministic or random, but it is simply a logical thinking for statistical inference. A limitation of Bayesian philosophy is that the result has too much reliance on the prior. For state estimation, it is natural to think the problem in a Bayesian fashion (MMSE). In this sense PF follows the Bayesian methodology. However, the interpretation of the estimate by PF often deviates from the Bayesian justification since the sample mean and sample variance of the PF have little to do with Bayesian inference. You only have to report High probability density (HPD) or something similar, since you only care about modes, mean, and/or median, instead of PDF. Note the MMSE, MMAE, and MAP estimates are the mean, median, and mode of the posterior PDF, respectively, if noise is Gaussian.
- Recursive importance sampling propagates HPD.
- Other kinds of PF
  - Sampling importance resampling (SIR) filter
  - Auxiliary sampling importance resampling (ASIR) filter
  - Regularized particle filter (RPF)
    　
Mixture Kalman filter
Extended Kalman filter: 1) use a linear model to approximate a nonlinear dynamic system; 2) use Kalmal filter based on the approximated linear model.
- Kalman filters propagate the first and second order statistics (mean& variance). Particle filters propagate particles (PMF or histogram), thus it has high computation/storage complexity.
- Bayesian belief: when we don't know the prior PMF of a discrete random variable/vector, we assume a certain prior PMF, called belief, so that we can proceed with Bayesian inference.
- For linear estimation, we typically use either Kalman filter or Wiener filter (no one use Wiener filter in practice). Wiener filter is restricted to stationary processes. But Kalman filter can deal with non-stationary processes (e.g., with time-varying mean and auto-correlation). Kalman filter can also deal with nonlinear systems, using extended Kalman filter. For nonlinear systems, a general estimation method is particle filter.

EM vs. BCJR

BCJR algorithm is an EM algorithm. Belief propagation algorithm.
Sum-product algorithm, factor graph.
All these algorithms propagate posterior probability; the output of one processing module, posterior probability, will be used as prior probability in another processing module.