Unconstrained Minimization

Def Unconstrained Minimization Suppose is convex and twice continuously differentiable (hence is open), the unconstrained minimization is as follows:

Decent Methods

Def Decent Method We define the descent methods as an iterative algorithm such that where is the step or decent direction, is the step size or step length.

Def Line Search The line search approach(algorithm) output a step size that determines how far should move along the given descent direction.

Algorithm Exact Line Search The exact line search algorithm find will output step size such that

Algorithm Backtracking Line Search With parameters , starting at , repeat until . backtracking-image.png|330 Backtracking line search vs Exact line search|500

Algorithm Gradient Descent Method The gradient decent method is a family of decent method with . And stopping criterion usually of the form .

Def Normalized Steepest Descent Direction The normalized steepest descent direction at given is

Def Unnormalized Steepest Descent Direction The unnormalized steepest descent direction at given is where

Algorithm Steepest Descent Method The steepest decent method is a family of decent method that find the steepest descent direction at point with

Newton’s Method

Def Newton Step The newton step is defined as: actually minimizes the second-order approximation of at : Prop is the steepest descent direction at in local Hessian norm

Def Newton Decrement Define the Newton decrement as follows as a measure of the proximity of to :

Extra close brace or missing open brace \nabla f(x)^{\mathrm{T}}\nabla^2f(x)^{-1}\nabla f(x) } $$ **Algorithm** <i><u>Newton’s Method</u></i> Given a starting point $x \in \text{dom}{f}$, tolerance $\varepsilon > 0$. Repeat the following: 1. Compute the Newton step $\Delta x_{\text{nt}}$ and decrement $\lambda$. 2. Stopping criterion. quit if $\frac{1}{2}λ^2 ≤ ε$. 3. Line search: Choose step size $t$ by [backtracking line search](Descent%20Methods%20in%20Unconstrained%20Optimization.md#^bls) 4. Update: $x:=x+t \Delta x_{\text{nt}}$ **Fact** Cholesky factorization is often used to solve $H=\nabla^2 f(x)$. Let $g=-\nabla f(x)$, we have: $$ H=LL^{\mathrm{T}}, \quad \Delta x_{\text{nt}}=L^{-\mathrm{T}}L^{-1}g, \quad \lambda = \| L^{-1}g \|_2 $$ **Prop** The Newton’s method is affine invariant, independent of linear changes of coordinates **Prop** The number of iterations until $f(x) − p^\star ≤ ε$ is bounded above by $$ \frac{f(x^{(0)})-p^\star}{\gamma}+\log_2 \log_2(\frac{\varepsilon_0}{\varepsilon}) $$ where $γ, ε_0$ are constants that depend on $m, L, x^{(0)}$. The second term is small and almost constant for practical purposes.