Optimal Control Theory & Dynamic Programming




Dynamic programming


Classification of optimal control problems:

Deterministic optimal control Stochastic optimal control

กก

Discrete state

กก

Discrete time

กก

Discrete state

กก

Discrete time Perfect state information
Imperfect state information
Continuous time Continuous time Perfect state information
Imperfect state information
Continuous state Discrete time Continuous state Discrete time Perfect state information
Imperfect state information
Continuous time Continuous time Perfect state information
Imperfect state information

In a deterministic optimal control problem, a state equation of the dynamical system is given as x_{k+1}=f_k(x_k,u_k) for a discrete time system where x_k is the state variable and u_k is the control variable, and dx(t)/dt=f(x_t,u_t) for a continuous time system.  There is no observation/measurement equation.

In a stochastic optimal control problem, a state equation of the dynamical system is given as x_{k+1}=f_k(x_k,u_k, w_k) for a discrete time system where w_k is random system disturbance/noise, and dx(t)/dt=f(x_t,u_t,w_t) for a continuous time system.  The observation/measurement equation is given as y_k=h_k(x_k) if there is no measurement noise; otherwise, y_k=h_k(x_k, v_k), where v_k is random measurement error/noise.

In a stochastic optimal control problem, perfect state information means that states are perfectly observable, e.g., y_k=h_k(x_k), where h_k is (deterministic) one-to-one correspondence; imperfect state information means that state are partially observable, e.g., y_k=h_k(x_k, v_k), where v_k is random (like a measurement error).

For deterministic optimal control problem, feedback control does not help (there is no need to use feedback control), i.e., optimal feed-forward control and optimal feedback control result in the same solution.  The optimal control trajectory and optimal state trajectory can be computed a priori.

For stochastic optimal control problem, there are two optimization criteria: expected discounted cost and expected average cost.  The expectation is over states.

Finite horizon Expected discounted cost
Expected average cost
Infinite horizon Expected discounted cost
Expected average cost

For stochastic optimal control problem, feedback control is used since feedback control can achieve better performance than feed-forward control.  The optimal control law is a sequence of functions {\mu_k(x_k)}, where {\mu_k} are functions indexed by time k, the function \mu_k take the state at time k, x_k, as its argument.   We don't know a priori what control action will be taken at time k, since typically we assume a causal system (we don't know/estimate x_k till time k).   Only when x_k is known or estimated, we can take a control action.


Solutions to the above optimal control problem:

12 cases:


Dynamic programming, Markov Decision Process (MDP), Partially Observable MDP (POMDP), Markov Control Process (MCP), Controlled Markov Process (CMP), Markov Control Model (MCM)

Deterministic dynamic programming

Stochastic dynamic programming


Risk sensitive (entropy minimizing) stochastic control: the cost is of an exponential form, using a sensitive parameter to characterize sensitivity; Risk neutral stochastic control: the cost is of additive form.

MDP, semi-Markov (SMDP), POMDP,

LQR problem: DARE, CARE,

Kalman filter: there is no control in the state equation.  x_{k+1}=A*x_k+ B*w_k and y_k=C*x_k+v_k.     Use Raccati recursion to update the covariance matrix of the state-estimation error.

LQR with imperfect state information

Other optimization criteria:


To find a lower bound of a constrained optimization problem, tighten the constraints (work on sufficient conditions); to find a upper bound of a constrained optimization problem, loosen the constraints (work on the necessary conditions).


Model Predictive Control (MPC): 

Model Predictive Control (MPC) has been widely adopted in industry as an effective means to deal with large multivariable constrained control problems. In MPC the control action is chosen by solving an optimal control problem on line.  The optimization aims at minimizing a performance criterion over a future (small) horizon, possibly subject to constraints on the manipulated inputs and outputs.

 MPC is different from conventional optimal control in the following aspects:

Although MPC has long been recognized as the preferred alternative for constrained systems, its applicability has been limited to slow systems such as chemical processes, where large sampling times make it possible to solve large optimization problems each time new measurements are collected from the plant.

Alternatively, the optimization problem can be solved off line for all the expected measurement values through multiparametric solvers.  The resulting feedback controller inherits all the stability and performance properties of MPC, and turns out to be piecewise linear. The on-line computation is reduced to a simple linear function evaluation. Therefore, the new technique is expected to enlarge the scope of applicability of MPC to applications with fast dynamics and sampling rates.