Unconstrained Minimization

Def Unconstrained Minimization
Suppose is convex and twice continuously differentiable (hence is open), the unconstrained minimization is as follows:

Decent Methods

Def Decent Method
We define the descent methods as an iterative algorithm such that

where is the step or decent direction, is the step size or step length.

Def Line Search
The line search approach(algorithm) output a step size that determines how far should move along the given descent direction.

Algorithm Exact Line Search
The exact line search algorithm find will output step size such that

Algorithm Backtracking Line Search
With parameters , starting at , repeat until .
backtracking-image.png|330
Backtracking line search vs Exact line search|500

Algorithm Gradient Descent Method
The gradient decent method is a family of decent method with . And stopping criterion usually of the form .

Def Normalized Steepest Descent Direction
The normalized steepest descent direction at given is

Def Unnormalized Steepest Descent Direction
The unnormalized steepest descent direction at given is

where

Algorithm Steepest Descent Method
The steepest decent method is a family of decent method that find the steepest descent direction at point with

Newton’s Method

Def Newton Step
The newton step is defined as:

actually minimizes the second-order approximation of at :
Prop is the steepest descent direction at in local Hessian norm

Def Newton Decrement
Define the Newton decrement as follows as a measure of the proximity of to :

\nabla f(x)^{\mathrm{T}}\nabla^2f(x)^{-1}\nabla f(x) } $$ **Algorithm** <i><u>Newton’s Method</u></i> Given a starting point $x \in \text{dom}{f}$, tolerance $\varepsilon > 0$. Repeat the following: 1. Compute the Newton step $\Delta x_{\text{nt}}$ and decrement $\lambda$. 2. Stopping criterion. quit if $\frac{1}{2}λ^2 ≤ ε$. 3. Line search: Choose step size $t$ by [backtracking line search](Descent%20Methods%20in%20Unconstrained%20Optimization.md#^bls) 4. Update: $x:=x+t \Delta x_{\text{nt}}$ **Fact** Cholesky factorization is often used to solve $H=\nabla^2 f(x)$. Let $g=-\nabla f(x)$, we have: $$ H=LL^{\mathrm{T}}, \quad \Delta x_{\text{nt}}=L^{-\mathrm{T}}L^{-1}g, \quad \lambda = \| L^{-1}g \|_2 $$ **Prop** The Newton’s method is affine invariant, independent of linear changes of coordinates **Prop** The number of iterations until $f(x) − p^\star ≤ ε$ is bounded above by $$ \frac{f(x^{(0)})-p^\star}{\gamma}+\log_2 \log_2(\frac{\varepsilon_0}{\varepsilon}) $$ where $γ, ε_0$ are constants that depend on $m, L, x^{(0)}$. The second term is small and almost constant for practical purposes.