Optimization Revisit: From Linear Programming to Lagrange

By Z.H. Fu
切问录 www.fuzihao.org

Optimization is the foundation of many other popular technologies and is widely used in our modern society. In this post, we will go through the basics in the foundation of optimization. We will start from the simplest linear programming, extend it to a more general conic version and finally arrive at Lagrange optimization. In order to concentrate on the main framework of optimization, we leave out some long proof which can be easily found in reference materials. All this post comes from course ENGG5501 taught by Prof. Anthony Man-Cho So in CUHK.

(I) Linear Programming

In linear programming (LP), the objective function and constraints are all linear w.r.t. variables. The standard form of linear programming is given as follows: \[\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \min_\mathbf{x}\ \ \ &\mathbf{c}^T \mathbf{x}\\ \text{s.t. }\ \ &A\mathbf{x}=\mathbf{b}\\ & \mathbf{x} \ge \mathbf{0} \end{aligned}\] Where \(\mathbf{x},\mathbf{c}\in \mathbb{R}^n\), \(\mathbf{b}\in \mathbb{R}^m\) \(A\in \mathbb{R}^{m \times n }\). Since \(\mathbf{x}\ge \mathbf{0}\), the feasible region contains no line. It implies that it has at least one vertex. As stated by a theorem, if (P) has at least one vertex. Then, either the optimal value is \(-\infty\) or there exists a vertex that is optimal.

Weak Duality

By observing problem (P). We are minimizing some function. A natural question is whether we can construct a lower bound, such that, no matter how we change our variables, the object functions always bigger than the lower bound. Now, let construct such a lower bound.

If we can find a vector \(\mathbf{y} \in \mathbb{R}^m\), s.t. \(A^T \mathbf{y}\le c\), then we have \(\mathbf{b}^T\mathbf{y}=\mathbf{x}^TA^T\mathbf{y}\le \mathbf{c}^T\mathbf{x}\). This is in fact another optimization problem. Because, for all \(\mathbf{y}\)s, the inequality above all holds. Obviously we want to find the max one to get the maximal lower bound. We firstly write the problem as follows:

\[\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \max_\mathbf{x}\ \ \ &\mathbf{b}^T \mathbf{y}\\ \text{s.t. }\ \ &A^T \mathbf{y}\le c \end{aligned}\]

Here, we find that for any LP primal problem (P), there exists a dual problem (D). The optimal value of the dual problem (D) is a lower bound of the primal problem (P). This is called the weak duality. By “weak”, we mean that we have an inequality relation holds between the two problems ( $ v _ p ^ * = v _ d ^ * $ ). A natural question is when the primal optimal value equals to the dual optimal value ($v _ p^ * = v _ d^ * $). It will be powerful if the equality holds, because if $v _ p^* = v _ d^* $, we can solve the primal problem by just solving the dual problem if the primal problem is to complicated. When the equality will hold is studied in the strong duality theory.

Strong Duality

To get the strong duality, we first introduce Farkas’s Lemma without proof.

(Farkas’s Lemma) Let \(A \in \mathbb{R}^{m\times n}\) and \(b \in \mathbb{R}^m\) be given. Then, exactly one of the following systems has a solution: \[ \begin{aligned} &Ax=b, x\ge0 \\ &A^Ty\le 0, b^Ty>0 \end{aligned}\] Farkas’s Lemma shows that there must be one and only on of the two systems has a solution. This can be used to prove the strong duality of LP problem.

(LP Strong Duality) Suppose that (P) has an optimal solution \(x^* \in \mathbb{R}^n\). Then, (D) also has an optimal solution \(y^* \in \mathbb{R}^m\), and \(c^T x^* = b^T y^*\).

Proof We claim that this system is unsolvable: $ Ax = b $ , $ x 0 $ , $ c ^ Tx < x ^ Tx ^ * $ (since $x^* $ attains the minimal value.). We homogenize this system and get another unsolvable system: \[\begin{aligned} \begin{bmatrix}A&-b\\c^T&-c^T x^* \end{bmatrix}\begin{bmatrix}x\\t\end{bmatrix}=\begin{bmatrix}0\\-1\end{bmatrix};\begin{bmatrix}x\\t\end{bmatrix}\ge 0 \end{aligned}\] By Fakas’s Lemma, we have: \[\begin{aligned} &\begin{bmatrix}A^T&c\\-b^T&-c^T x^* \end{bmatrix}\begin{bmatrix}y_1\\y_2\end{bmatrix}\le 0;-y_2>0\text{ is solvable}\\ &\Rightarrow \\ &A^T(\frac{y_1}{y_2})-c \le 0 \ \ \ (1)\\ &b^T(\frac{y_1}{y_2})+c^Tx^* \le 0 \ \ \ (2) \end{aligned}\] Denote $ - ( )=y ^ * $From (1), we have: \(A ^ T y ^ * \le c\), this is the dual feasibility, which means that (D) has an optimal solution. From (2), we have: $c ^ Tx ^ * b ^ T y ^ * $ and combined with the weak duality, we have $c ^ Tx ^ * = b ^ T y ^ * $.

From the above, we can see that the LP problem has a good quality. If the primal problem has an optimal solution, the dual problem automatically has an optimal solution. And the duality gap is 0 ($ c ^ Tx ^ * = b ^ T y ^ * $).

Complementarity

Since the primal and dual problem have a zero duality gap. We have: \(0 = c^Tx-b^Ty=x^Tc-(Ax)^Ty=x^Tc-x^TA^Ty=x^T(c-A^Ty)\). Since \(x\ge0;(c-A^Ty)\ge0\) Which means that if \(x_ i\ne0\) then \((c-Ay)_ i\) must be 0 and vice versa. As a result the optimization problem can be converted into solving a constrained linear system: \[\begin{aligned} &Ax=b,x\ge0\ \ \ &\text{(primal feasibility)}\\ &A^T y \le c\ \ \ &\text{(dual feasibility)}\\ &x_i(c-Ay)_ i=0,i=1,...,n\ \ \ &\text{(complementarity)} \end{aligned}\] It can be seen later that almost all optimization problems can be solved by the primal feasibility, dual feasibility and complementary even though they are not linear.

(II) Conic Linear Programming

From the previous chapter, we find that the constrain of LP problem is \(x\ge0\). However, a lot of practical problems are not constrained in this region. A natural problem is that whether we can constrain the variable on more flexible region. In this chapter, we will firstly extend \(x\ge 0\) to a more general representation and with this representation, we can describe more constrained optimization problems.

Extend \(x\ge 0\) to a Pointed Cone

We can see that by saying \(u \ge v\), we actually define a “partial order”. I.e. if \(u \ge v\), it satisfies:

  1. (Reflexivity) \(u \ge u\) for all \(u \in \mathbb{R}^n\);
  2. (Anti–Symmetry) \(u \ge v\) and \(v \ge u\) imply \(u=v\) for all \(u,v\in \mathbb{R}^n\);
  3. (Transitivity) \(u ge v\) and \(v \ge w\) imply \(u\ge w\) for all \(u,v\in \mathbb{R}^n\);
  4. (Homogeneity) for any \(u,v\in \mathbb{R}^n\) and \(\alpha \ge 0\), if \(u \ge v\),then \(\alpha u\ge \alpha v\);
  5. (Additivity) for any \(u,v,w,z\in \mathbb{R}^n\), if \(u\ge v\) and \(w\ge z\), then \(u+w \ge v+z\).

Now, we are going to give a more general representation of these five conditions. We firstly introduce the pointed cone. We claim that \(K\) is a pointed cone if it has the following properties: 1. \(K\) is non–empty and closed under addition; i.e., \(u + v \in K\) whenever \(u, v \in K\). 2. \(K\) is a cone; i.e.,for any \(u\in K\) and \(\alpha > 0\), we have \(\alpha u\in K\). 3. \(K\) is pointed; i.e., if \(u\in K\) and \(-u\in K\), then \(u = 0\).

Then, we show how 1-3 can represent (a)-(e). Firstly, we define \[u \ge_K v \Leftrightarrow u-v\in K\] We can prove “\(\ge_K\)” fulfill (a)-(e) one by one with 1-3. Therefore, we can see that “” can be represented by a pointed cone. If the pointed cone \(K={x:x\ge 0}\) it is exactly the constrain \(x\ge 0\). There are some other pointed cones such as Lorentz Cone (\(Q^{n+1} = \{(t, x) \in \mathbb{R} \times \mathbb{R}^n : t \ge \|x\|_ 2\}\)) and Positive Semidefinite Cone (\(S_+^n = \{X \in S^n : u^T Xu \ge 0, \forall u \in \mathbb{R}^n \}\)), where \(S^n\) is the space of all symmetric $ n n $ matrix. We can see that our variable can be not only a vector, but also a matrix.

Conic Linear Programming

By saying conic linear programming, we are going to optimize a linear objective function with conic constrains. The form of the primal problem is defined as:

\[\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \inf_\mathbf{x}\ \ \ &\mathbf{c}^T \bullet \mathbf{x}\\ \text{s.t. }\ \ &a_i\bullet \mathbf{x}=b_i, i\in[1,m]\\ & \mathbf{x} \ge_K \mathbf{0} \end{aligned}\]

Since the variable may be a matrix (\(x\) in a PSD cone), we use a more general inner product \(\bullet\) to express the inner product of vector or matrix. Here we use \(\inf\) instead of \(\min\), because different from LP, CLP problem cannot always attain the optimal value. The last term \(\mathbf{x} \ge_K \mathbf{0}\) can also be written as \(\mathbf{x}\in K\).

Weak Duality

Similar to LP problem, we can also derive the weak duality for CLP problem. \[b^Ty=\sum_{i=1}^m(a_i\bullet x)y_i=\sum_{i=1}^my_ia_i\bullet x \le c\bullet x\] We found that if we require that \(x(c-\sum_{i=1}^my_ia_i)\ge 0\), \(b^Ty\) is a lower bound of \(c\bullet x\). Since we already have \(x\in K\), we define the dual cone of \(K\) as \(K^* =\{w\in E:x\bullet w \ge0, \forall x\in K\}\). As a result, for any $ c-_{i=1}^my_ia_i K ^ * $ , $ b ^ T y $ is always the lower bound of \(c\bullet x\). Therefore, the dual form of CLP can be expressed as:

\[\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \sup_\mathbf{x}\ \ \ &\mathbf{b}^T \mathbf{y}\\ \text{s.t. }\ \ &c-\sum_{i=1}^my_ia_i \in K^* \end{aligned}\] Sometimes, it’s not easy to find the dual cone of a pointed cone. But luckily, a lot of common used pointed cone is self-dual, i.e. $K=(K ^ * ) ^ * $. For example, \((\mathbb{R}^n_+)^* = \mathbb{R}^n_+,(\mathbb{R}^n_+)^* =Q^{n+1},(S_+^n)^* = S_+^n\).

Strong Duality

As the same with LP, after we derived the weak duality, we are interested in strong duality. I.e. when \(b^Ty=c\bullet x\). But since we extend the constraints of LP problem to a more general form, we should pay the price. The root of the problem arises at the failure of Farkas Lemma which we used to prove the LP strong duality. We give a weaker Farkas Lemma with a stronger requirement without proof.

(Conic Farkas’ Lemma) Let \(E\) be a finite–dimensional Euclidean space equipped with an inner product \(\bullet\), \(K \subset E\) be a closed pointed cone with non–empty interior, and \(a_1,\cdots , a_m \in E\) and \(b \in \mathbb{R}\) be given vectors. Suppose that the Slater condition holds; i.e., there exists a \(\bar{y} \in R\) such that \(-\sum_{i=1}^m \bar{y}_ia_i \in int(K^* )\). Then, exactly one of the systems (I) and (II) has a solution: \[\begin{aligned} &(I) a_i\bullet x=b_i, i\in[1,m],x\in K\\ &(II)-\sum_{i=1}^m y_ia_i\in K^* ,b^Ty>0 \end{aligned}\]

Observed that the Slater condition is a sufficient condition for the conic Farkas Lemma. Compared with the LP problem’s Farkas’s Lemma, it’s a little bit weaker. Use the Conic Farkas’s Lemma, we can prove the strong duality of CLP. We give the CLP strong duality without proof.

(CLP Strong Duality)

(a) Suppose that (P) is bounded below and strictly feasible; i.e., there exists a feasible solution \(\bar{x}\) to (P) such that \(\bar{x}\in int(K)\). Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution \((\bar{y},\bar{s})\) to (D) such that $b ^ T{y}=v_p ^ * = v_d ^ * $; i.e., the common optimal value is attained by some dual feasible solution. (b) Suppose that (D) is bounded above and strictly feasible; i.e., there exists a feasible solution \((\bar{y},\bar{s})\) to (D) such that \(\bar{s}\in int(K^* )\). Then, we have $v_p^* = v_d^* $. Moreover, there exists a feasible solution \(\bar{x}\) to (P ) such that $c{x}=v_p^* = v_d^* $ ; i.e., the common optimal value is attained by some primal feasible solution. (c) Suppose that either (P ) or (D) is bounded and strictly feasible. Then, given a feasible solution \(\bar{x}\) to (P) and a feasible solution \((\bar{y},\bar{s})\) to (D), the following are equivalent:

  • \(\bar{x}\) and \((\bar{y},\bar{s})\) are optimal for (P) and (D), respectively.
  • The duality gap is zero; i.e., \(c\bullet \bar{x}=b^Ty\).
  • We have complementary slackness; i.e., $ {x}{s}= 0$.

From above we observe that if (P) or (D) have good property (have an inner feasible solution). The counter part (not themselves!) i.e. (D) or (P) can attain the optimal value. If the both have an inner feasible solution both of them can attain the optimal value.

(III) Lagrangian Duality

From the above chapters, we can see that for both LP and CLP problem, we construct the minimal value’s lower bound, the lower bound is exactly the maximum value of another problem (dual problem). Under some condition, the lower bound equals to the minimal value and thus the duality gap equals to 0. A natural question is: can we use the same trick to construct the lower bound for an arbitrary objective function? The answer is yes. We will show as follows. \[\begin{aligned} (P)\ \ \ \ \ \ v_p^* = \inf\ \ \ &f(x)\\ \text{s.t. }\ \ &g_i(x)\le0,i=1,...,m_1,\\ & h_j(x)=0,j=1,...,m_2,\\ &x\in X \end{aligned}\]

Weak Duality

We will construct the weak duality by a penalty function. Firstly, we denote \(G(x)=(g_1(x), g_2(x),\cdots,g_{m_1}(x))\),\(H(x)=(h_1(x), h_2(x),\cdots,h_{m_1}(x))\). We define the Lagrangian function as \[L(x,v,w)=f(x)+v^TG(x)+w^TH(x)\] Where \(v\in\mathbb{R}^{m_1},w\in\mathbb{R}^{m_2}\). By using the Lagrangian function, we can rewrite the primal problem (P) as follows: \[\inf_{x\in X}\sup_{v\ge0,w}L(x,v,w)\] To examine this, we can observe the \(\sup\) inside is actually a penalty function. Because \[\sup_{v\ge0,w}L(x,v,w)=\left\{\begin{aligned}f(x),\ \ \ &if\ G(x)\le 0\ and\ H(x)=0\\+\infty,\ \ \ \ &otherwise\end{aligned}\right.\] If x violates the constraints, we can find a proper \(v\) or \(w\) to make the value to be \(+\infty\). Thus, it cannot be the minimal value. It can be examined that for any \(\bar{x},\bar{v},\bar{w}\), we have: \[\inf_{x\in X}L(x,\bar{v},\bar{w})\le L(\bar{x},\bar{v},\bar{w})\le \sup_{v\ge0, w}L(\bar{x},v,w)\] We denote \(\inf_{x\in X}L(x,\bar{v},\bar{w})\) as \(\theta(v,w)\). We can see that if $ {x}$ is feasible for (P), then \(\sup_{v\ge0, w}L(\bar{x},v,w)=f(\bar{x})\). Then we get: \[\theta(\bar{v},\bar{w})\le f(\bar{x})\] We construct the dual problem as \[\begin{aligned} (D)\ \ \ \ \ \ v_d^* = \sup_{v\ge0,w}\ \ \ &\theta(v,w) \end{aligned}\] It is a lower bound of \(f(\bar{x})\) is \(\bar{x}\) is feasible for (P).

Strong Duality

Now, we study the strong duality of Lagrangian duality. We will introduce the Saddle point first and then we will show that the strong duality is attained at the saddle points.

Define Saddle Point We say \((\bar{x},\bar{v},\bar{w})\) is a saddle point if

  1. \(\bar{x}\in X\);
  2. \(\bar{v}\ge0\);
  3. \(\forall x\in X,v\ge 0,w,L(\bar{x},v,w)\le L(\bar{x},\bar{v},\bar{w})\le L(x,\bar{v},\bar{w})\)

The theorem of strong duality is as follows:

Theorem Strong Duality \((\bar{x},\bar{v},\bar{w})\) is a saddle point of (P) iff $v_p ^ * =v_d ^ * $; \(\bar{x}\) is optimal for (P);\((\bar{v},\bar{w})\) is optimal for (D). Proof

\(\Rightarrow\):

By (1), \(\bar{x}\in X\), this implies \(\bar{x}\) is feasible for (P); By (2), \((\bar{v},\bar{w})\) is feasible for (D); By (3), \[\begin{aligned}\theta(\bar{v},\bar{w})&=\inf_{x\in X}L(x,\bar{v},\bar{w})\\ &=L(\bar{x},\bar{v},\bar{w})\\ &=\sup_{v\ge0,w}L(\bar{x},v,w)\\ &=f(\bar{x}) \end{aligned}\]

\(\Leftarrow\):

If saddle structure exists, (1) and (2) holds obviously. It is suffices prove (3). Consider \(\theta(\bar{v},\bar{w})=\inf_{x\in X}L(x,\bar{v},\bar{w})\le L(\bar{x},\bar{v},\bar{w})\le\sup_{v\ge0,w}L(\bar{x},v,w)=f(\bar{x})\). Since we already have $ ({v},{w})=v_d ^ * =v_p ^ * = f({x}) $ Then, the previous inequality all gets equality.

Same with previous chapters, we can also write the primal feasibility, dual feasibility and complementarity.

Saddle Point Optimality Conditions \((\bar{x},\bar{v},\bar{w})\) is a saddle point \(\Leftrightarrow\)

  1. (Primal Feasibility) \(\bar{x}\in X,G(\bar{x}) \le 0,\ and\ H(\bar{x}) = 0.\)
  2. (Dual Feasibility/Lagrangian Optimality) \(\bar{v}\ge 0\ and\ \bar{x} = \arg \min_{x\in X} L(x, \bar{v}, \bar{w})\).
  3. (Complementarity) \(\bar{v}^T G(\bar{x}) = 0\).

Karush–Kuhn–Tucker (KKT) Necessary Conditions

KKT is a necessary condition for optimal value. I.e. for any optimal solution, it must satisfy KKT condition. The KKT condition is given as follows: (The Karush–Kuhn–Tucker Necessary Conditions) Let \(\bar{x}\in S\) be a local minimum of problem (P). Let \(I = {i \in {1,...,m_1} : g_i(\bar{x}) = 0}\) be the index set for the active constraints. Suppose that \(\bar{x}\) is regular; i.e., the family \(\{\nabla g_i(\bar{x})\}_{i\in I} \bigcup \{\nabla h_j(\bar{x})\}^{m_2}_{j=1}\) of vectors is linearly independent. Then, there exist \(v_1,...,v_{m_1} \in \mathbb{R}\) and \(w_1,...,w_{m_2} \in \mathbb{R}\) such that

  1. \(\nabla f(\bar{x})+\sum_i v_i\nabla g_i(\bar{x})+\sum_jw_j\nabla h_j(\bar{x})=0\);
  2. \(v\ge0\);
  3. \(v_ig_i(\bar{x})=0,\forall i\)(complementarity)

It should be noted that it requires that \(\bar{x}\) fulfill the regularity condition. There are also a lot of other regularity conditions that if one on them holds, then KKT condition holds.

(Regularity Condition 2) Consider problem (P), where \(g_1, . . . , g_{m1}\) are convex and \(h1, . . . , h_{m_2}\) are affine. Let \(\bar{x}\in S\) be a local minimum and \(I = \{i \in {1,...,m_1} : g_i(\bar{x}) = 0\}\). Suppose that the Slater condition is satisfied; i.e., there exists an \(x' \in S\) such that \(g_i(x') < 0\) for \(i \in I\). Then, \(\bar{x}\) satisfies the KKT conditions.

(Regularity Condition 3) Consider problem (P), where \(g_1, . . . , g_{m1}\) are concave and \(h_1, . . . , h_{m_2}\) are affine. Let \(\bar{x}\in S\) be a local minimum. Then, \(\bar{x}\) satisfies the KKT conditions.

(Regularity Condition 3) is important. Since all linear constraints is both convex and concave. It implies that if constraints are all linear, it fulfill KKT condition.

Reference

[1] ENGG5501 course note. Anthony Man-Cho So. CUHK.