Nonlinear programming
- Stationary point: x_0 is a stationary point
if d f(x) /dx evaluated at x_0 is equal to 0. For simplicity, denote d
f(x_0) /dx as d f(x) /dx evaluated at x_0.
- local extremal/extreme point (extremum):
x_0 is an extremum if d^k f(x_0)/dx^k=0, for all k (1<=k<n),
and d^n f(x_0)/dx^n is nonzero and n is an even number
(n>=2). Further,
- x_0 is a local/relative minimum
(point) if d^n f/dx^n>0. A local minimum is the smallest value in some local
neighborhood.
- x_0 is a local/relative maximum
(point) if d^n f/dx^n<0. A local maximum is the largest value in some local
neighborhood.
- Minimum (resp., maximum) can mean
either minimum (resp., maximum) point or minimum (resp., maximum)
value.
- saddle point: x_0 is a saddle point
if d^k f(x_0)/dx^k=0, for all k (1<=k<n), and d^n f(x_0)/dx^n
is nonzero and n is an odd number.
- Convex programming
- Optimize a convex objective function
under convex constraints.
- Unique features
- The problem can be solved numerically
with great efficiency
- The unique local optimal solution is
also the global optimal solution
- No heuristic stopping criteria; it
has provable lower bounds to provide a stopping criterion for any
accuracy.
- Convex programming can also handle
non-differential/non-smooth problems as well as smooth problems.
- Its duality can provide certificates
that prove infeasibility or lower bounds on objective.
- The duality theory can be used for
sensitivity analysis w.r.t. changes in the objective and the
constraints.
- Geometric programming
- monomial function: f(x) = c * x_1
^{alpha_1} * x_2 ^{alpha_2} *... * x_n ^{alpha_n}, where
- c>=0
- domain of f: x_1>=0; x_2>=0,
..., x_n>=0
- alpha_i \in R, for all i.
- posynomial function: a sum of monomial
functions.
- geometric programming has a posynomial
objective with posynomial constraints.
- transformation of variables: y_i = log
x_i; or x_i = exp(y_i).
- monomial function: log f(x) = alpha_1 *
y_1 + ... + alpha_n * y_n + beta is affine in y (where beta = log c).
- posynomial function: log f(x) = log \sum_{k=1}^r
exp(alpha_{1k} * y_1 + ... + alpha_{nk} * y_n + beta_k) is convex in y.
- geometric programming in convex
form:
- take the log and replace x_i by
exp(y_i), e.g., from f(x), we get log f( exp(y_1), ..., exp(y_n)).
- Semi-definite Programming (SDP)
- linear objective function with linear
matrix inequality (LMI) constraints
- LMI: \sum_{i=1}^n F_i * x_i
<= 0 (semi-negative definite), where F_i are symmetric
matrices. It can be regarded as linear combination of x_i with
matrix coefficients.
- SDP is a convex program.
- LMI is a generalization of linear constraints
in linear programming (LP), where the constraints are Ax<= b.
- What does regression mean?
- Regression means
- given observation y, estimating the value of
the underlying parameter x that produces y. It is like moving backward
(in logical sense, not necessarily in time sense) to find the cause, given the
observation.
- For example, estimating engine efficiency
based on oil pressure.
- Regression generalizes classification since
x can be any real-valued quantity, including a class index.
Many classification algorithms can be understood as quantizing the
output of a regression. Like classification, one can obtain the conditional
model of x from a joint model (which includes a model of y)
or it can be learned directly. Curve fitting is the common special case
where y is assumed to be a deterministic function of x,
plus additive noise (usually Gaussian), i.e., y=f(x) +n. Methods for curve
fitting include radial basis functions, feed-forward neural networks, and
mixtures of experts.
- Constrained vs. unconstrained optimization
- Typically, constrained optimization problems
are more difficult to solve than the unconstrained ones since constrained
optimization problems have boundary conditions to be handled.
Typically, the more constraints, the more difficult to solve the problem.
But if the the feasible solution set is very small due to too many
constraints, it is easier to solve when the problem has more constraints.
Key Techniques: