Augmented Lagrangian Methods
In this blog, we introduce the augmented Lagrangian method and its variants. Let us start with the optimization with equaltiy constraints:
\[\begin{equation} \label{eq:QP_eq} \begin{aligned} \min_x\; &\frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.}\; & Ax = b, \end{aligned} \end{equation}\]where $x \in \mathbb{R}^{n}$, $Q \in \mathbb{R}^{n\times n}$, $c \in \mathbb{R}^{n}$, $A \in \mathbb{R}^{m\times n}$, $b \in \mathbb{R}^{m}$. Equivalently, we write its min-max formulation as
\[\begin{equation} \label{eq:min_max} \min_x \max_{\lambda}\; \frac{1}{2} x^\top Q x + c^\top x + \lambda^\top (Ax - b), \end{equation}\]where $\lambda \in \mathbb{R}^{n}$ is the Lagrange multiplier vector related to the equality constraints. For readers who are not familiar with the min-max formulation, please refer to this tutorial video. The corresponding KKT conditions is then given by
\[\begin{bmatrix} Q & A^\top \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} -c \\ b \end{bmatrix}.\]Note that the KKT matrix is nonsingular if the following assumptions hold.
-
The constraint Jacobian $A$ has full row rank;
-
The matrix $Q$ is positive definite on the tangent space of the constraints, that is, $d^\top Q d > 0$ for all $d \neq 0$ such that $Ad = 0$.
The first assumption implies that there are no redundant constraints. The second assumption indicates that the objective function is strictly convex in all feasible directions. To better understand the second assumption, let’s clarify what we mean by feasible directions. Given a point $x$ satisfying the constraints $Ax = b$, a feasible direction is a vector $d$ such that $x + d$ still satisfies the constraints. For equality constraints $Ax = b$, if $x$ is feasible (i.e., $Ax = b$), then a direction $d$ is feasible if and only if $A(x + d) = b$. Since $Ax = b$, we obtain $Ad = 0$. Therefore, the set of all feasible directions forms the null space of the constraint matrix $A$. This null space is also called the tangent space of the constraints.
Here, we introduce one method for addressing situations in which the first assumption is violated. When the row rank of $A$ is deficient, the row vectors of $A$ does not span the full space of constraints. Consequently, the dual variable $\lambda$ may not be uniquely determined, and the min-max problem \eqref{eq:QP_eq} and the KKT system \eqref{eq:min_max} may not be well-posed. To address this, we add a regularization term related to $\lambda$, yielding
\[\min_x \max_{\lambda}\; \frac{1}{2} x^\top Q x + c^\top x + \lambda^\top (Ax - b) - \frac{\rho}{2}\Vert \lambda - \bar{\lambda}\Vert_2^2.\]A typical choice of $\bar{\lambda}$ is the multiplier vector obtained from the last iteration. The modified KKT system is then
\[\begin{equation} \label{eq:regu_KKT} \begin{bmatrix} Q & A^\top \\ A & -\rho I \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} -c \\ b - \rho\bar{\lambda} \end{bmatrix}. \end{equation}\]To obtain the solution of the optimization problem in \eqref{eq:QP_eq}, we need to solve a sequence of KKT systems in \eqref{eq:regu_KKT} using the $LDL^\top$ factorization. Alternatively, in the min-max formulation, by solving the inner maximization problem, we can express $\lambda$ as
\[\lambda = \frac{1}{\rho}(Ax - b) + \bar{\lambda}.\]Substituting $\lambda$ into the outer minimization problem, we obtain
\[\begin{aligned} & \min_ x\; \frac{1}{2} x^\top Q x + c^\top x + \left( \frac{1}{\rho}(Ax - b) + \bar{\lambda} \right)^\top (Ax - b) - \frac{\rho}{2} \left\Vert \frac{1}{\rho} (Ax - b) \right\Vert_2^2 \\ = & \min_ x\; \underbrace{\frac{1}{2} x^\top Q x + c^\top x + \bar{\lambda}^\top (Ax - b) + \frac{1}{2\rho} \Vert Ax-b \Vert_2^2,}_{\mathcal{L}_{a}(x, \bar{\lambda}; \rho)} \end{aligned}\]where $\mathcal{L}_a$ is known as the augmented Lagrangian function. Equivalently, by completing the square, we can rewrite $\mathcal{L}_a$ as
\[\mathcal{L}_{a}(x, \bar{\lambda}; \rho) := \frac{1}{2} x^\top Q x + c^\top x + \frac{1}{2\rho}\left\Vert Ax -b + \rho\bar{\lambda} \right\Vert_2^2 - \frac{\rho}{2} \Vert \bar{\lambda} \Vert^2_2.\]The augmented Lagrangian method (ALM) alternates between the minimization of $\mathcal{L}_{a}$ with respect to the primal variables $x$ and a simple update rule for the dual variables $\lambda$:
\[\begin{aligned} x^{k+1} &= \text{arg}\min_x \mathcal{L}_{a}(x, \lambda^k; \rho^k) \\ \lambda^{k+1} &= \lambda^k + \frac{1}{\rho^k}(Ax^{k+1} - b), \end{aligned}\]where $k$ represents the iteration index.
Next, we take inequality constraints into account. A quadratic program with inequality constraints is formulated as
\[\begin{aligned} \min_x\; &\frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.}\; & Cx \leq d. \end{aligned}\]By introducing a slack vector $s$, we rewrite the optimization problem as
\[\begin{aligned} \min_{x, s}\; &\frac{1}{2} x^\top Q x + c^\top x \\ \text{s.t.}\; & Cx + s = d \\ & s \ge 0. \end{aligned}\]As above, we consider the regularized min-max formulation:
\[\min_{x,\, s \ge 0} \max_{y}\; \frac{1}{2} x^\top Q x + c^\top x + y^\top (Cx - d + s) - \frac{\mu}{2}\Vert y - \bar{y}\Vert_2^2,\]where $y$ is the vector of Lagrange multipliers. Similarly, solving the inner optimization problem, we obtain
\[\begin{equation} \label{eq:min_x_s} \min_{x,\, s \ge 0}\; \frac{1}{2} x^\top Q x + c^\top x + \frac{1}{2\mu}\left\Vert Cx - d + s + \mu\bar{y} \right\Vert_2^2 - \frac{\mu}{2} \Vert \bar{y} \Vert^2_2. \end{equation}\]Next, we minimize the objective function above over the slack vector $s$, which results in
\[s^* = \begin{cases} - Cx + d - \mu\bar{y}\quad &\text{if}\; Cx - d \leq -\mu\bar{y}, \\ 0 & \text{otherwise}. \end{cases}\]Substituting $s^*$ into \eqref{eq:min_x_s}, we obtain
\[\min_{x}\; \underbrace{\frac{1}{2} x^\top Q x + c^\top x + \frac{1}{2\mu}\left\Vert \max(Cx - d + \mu\bar{y}, 0) \right\Vert_2^2 - \frac{\mu}{2} \Vert \bar{y} \Vert^2_2}_{\mathcal{L}_{a}(x, \bar{y};\, \mu)}.\]The update rule is then given by
\[\begin{aligned} x^{k+1} &= \text{arg}\min_x \mathcal{L}_{a}(x, y^k; \mu^k) \\ y^{k+1} &= y^k + \frac{1}{\mu^k}(Cx^{k+1} - d + s^*) \\ &= \max \left( y^k + \frac{1}{\mu^k}\left(Cx^{k+1} - d\right), 0 \right). \end{aligned}\]