DefBi-level Optimization Problem
Bi-level optimization problems refer to two related optimization tasks, each one is assigned to a decision level (i.e., upper and lower levels). The evaluation of an upper level solution requires the evaluation of the lower level. minimize subject to J(x,y)y∈uargminf(x,u)
DefParameterized Optimization Problem
Parameterized optimization problem considers optimization problems depending on a parameter and describes how the feasible set, the value function, and the local or global minimizers of the program depend on changes in the parameter. That is we define a solution mapping x↦y as y(x)=u⋆: y(x)∈argminusubject touf(x,u)hi(x,u)=0,i=1,…,pgi(x,u)≤0,i=1,…,q.
Differentiable Optimization
ThrmDini’s Implicit Function Theorem
Let f:Rn×Rm→Rm be differentiable in a neighbourhood of (x,u) and such that f(x,u)=0, and let ∂u∂f(x,u) be nonsingular. Then the solution mapping Y has a single-valued localization y around x for u which is differentiable in a neighbourhood X of x with Jacobian satisfying dxdy(x)=−(∂y∂f(x,y(x)))−1∂x∂f(x,y(x))
e.g. Circle example: Consider f(x,y)=x2+y2−1, we have dxdy=−(∂y∂f)−1(∂x∂f)=−(2y1)(2x)=−yx
ThrmDifferentiating Unconstrained Optimization Problems
Let f:R×R→R be twice differentiable and let y(x)∈argminuf(x,u). Then for non-zero Hessian dxdy(x)=−(∂y2∂2f)−1∂x∂y∂2f
ThrmDifferentiating Equality Constrained Optimization Problems
Consider functions f:Rn×Rm→R and h:Rn×Rm→Rq. Let y(x)∈argminu∈Rmsubject tof(x,u)h(x,u)=0
Assume that y(x) exists, that f and h are twice differentiable in the neighbourhood of (x,y(x)), and that rank(∂y∂h(x,y))=q. Then for H non-singular:dxdy(x)=H−1AT(AH−1AT)−1(AH−1B−C)−H−1Bwhere
&&B=\frac{\partial^2f(x,y)}{\partial x\partial y}-\sum_{i=1}^q\nu_i\frac{\partial^2h_i(x,y)}{\partial x\partial y}\in\R^{m\times n}\\
C=\frac{\partial h(x,y)}{\partial x}\in\R^{q\times n}&&
H=\frac{\partial^2f(x,y)}{\partial y^2}-\sum_{i=1}^q\nu_i\frac{\partial^2h_i(x,y)}{\partial y^2}\in\R^{m\times m}\end{array} $$
and $\nu \in \R^q$ satisfies $\nu^{\mathrm{T}}A = \frac{\partial f(x,y)}{\partial y}$
**Fact** <i><u>Dealing with Inequality Constraints</u></i>
Replace inequality constraints with log-barrier approximation, treat as equality constraints if active and ignore otherwise.
**Fact** At one extreme we can try back propagate through the optimization algorithm, while at the other extreme we can use the implicit differentiation result to hand-craft efficient backward pass code. However, there are also options in between, for example:
- use automatic differentiation to obtain quantities $A$, $B$, $C$ and $H$ from software implementations of the objective and (active) constraint functions.
- implement the optimality condition $∇\mathcal{L} = 0$ in software and automatically differentiate that.