Projection Methods for Linear Equality. Constrained Problems. Robert M. Freund. March, 2004. 1. 2004 Massachusetts Institute of Technology...

0 downloads 38 Views 123KB Size

Projection Methods for Linear Equality Constrained Problems Robert M. Freund March, 2004

2004 Massachusetts Institute of Technology.

1

1

Review of Steepest Descent

Suppose we want to solve P : minimize f (x) x ∈ n ,

s.t.

where f (x) is diﬀerentiable. At the point x = x ¯, f (x) can be approximated by its linear expansion f (¯ x + d) ≈ f (¯ x) + ∇f (¯ x)T d

for d “small.” This leads to the choice of d dictated by the direction-ﬁnding problem: minimize ∇f (¯ x)T d d ≤ 1,

s.t.

which is equivalent to: minimize ∇f (¯ x)T d dT Id ≤ 1.

s.t.

The solution to this direction ﬁnding problem is:

2

−∇f (¯ x) . d¯ = ∇f (¯ x) Because we choose our next step as ¯ + αd¯ x = x

for some choice of step-length α, then we can re-scale the direction d¯ simply as: d¯ = −∇f (¯ x).

That is, the steepest descent direction is simply the negative of the gradient of f (x) at x = x ¯.

2

Equality Constrained Problems

Now consider the slightly more complicated problem P : minimize f (x) s.t.

Ax = b x ∈ n ,

where f (x) is diﬀerentiable. The KKT conditions for this problem are as follows: A¯ x = b T x) +A π ∇f (¯ ¯ = 0. 3

We wish to ﬁnd such a KKT point. Suppose that we are at the point x = x, ¯ where A¯ x = b, i.e., x ¯ is a feasible point. Again we have f (¯ x + d) ≈ f (¯ x) + ∇f (¯ x)T d

for d “small.” In order to choose the direction d¯ and compute the next point

¯ + αd¯ x = x

for some stepsize α, we will solve the following direction-ﬁnding problem: ¯ x) + ∇f (¯ minimize f (¯ x)T (x − x) s.t.

Ax = b x − x ¯ ≤ 1,

or equivalently (by setting d = x − x ¯)

x)T d minimize ∇f (¯

s.t.

Ad = 0 dT Id ≤ 1.

4

Note that Ad = 0 ensures that A(¯ x + αd) = A¯ x = b for any α. Also note that the constraint “dT Id ≤ 1” says that d must lie in the Euclidean unit ball B, deﬁned as: B = {d ∈ n | dT Id ≤ 1}.

However, the Euclidean ball is but one metric, and we might instead be more general, and choose to restrict d to lie in an ellipsoid EQ = {d ∈ n | dT Qd ≤ 1},

where Q is a given symmetric and positive-deﬁnite matrix. This leads to the more general direction-ﬁnding problem: minimize ∇f (¯ x)T d s.t.

Ad = 0 dT Qd ≤ 1.

The projected steepest descent algorithm is: Step 1. x ¯ satisﬁes A¯ x = b. Compute ∇f (¯ x).

Step 2. Solve the direction-ﬁnding problem (DFP): DFP:

d¯ = arg minimum ∇f (¯ x)T d s.t.

Ad = 0 dT Qd ≤ 1, 5

If ∇f (¯ x)T d¯ = 0, stop. In this case, x ¯ is a Karush-Kuhn-Tucker point.

x + αd¯) for the stepsize α, ¯ perhaps chosen by an exact Step 3. Solve minα f (¯ or inexact linesearch.

Step 4. Set x ¯←x ¯ + αd. ¯ ¯ Go to Step 1.

Note that if Q = I and the equality constraints Ax = b are absent, this algorithm is just the steepest descent algorithm.

3

Properties of the Projected Steepest Descent Direction d¯

Note that DFP is a convex program, and d = 0 is a Slater point. Therefore, the Karush-Kuhn-Tucker conditions are necessary and suﬃcient for optimality in DFP. These conditions are: Ad¯ = 0

(1)

d¯T Qd¯ ≤ 1

(2)

¯ ¯= 0 ¯ + 2βQd ∇f (¯ x) + AT π

(3)

β¯ ≥ 0

(4)

6

¯ ¯ = 0. β¯(1 − dQd)

(5)

As it turns out, it is extremely easy to solve these equations. (We will see this shortly.) Let d¯ solve the equations (1)-(5) with multipliers β¯ and π ¯. Proposition 3.1 x ¯ is a Karush-Kuhn-Tucker point of P if and only if T ¯ ∇f (¯ x) d = 0. Proposition 3.2 x ¯ is a Karush-Kuhn-Tucker point of P if and only if β¯ = 0. Proposition 3.3 If x ¯ is not a Karush-Kuhn-Tucker point of P , then d¯ is a descent direction. Proposition 3.4 The projected steepest descent algorithm has the same convergence properties and the same linear convergence as the steepest de scent algorithm. Under the same conditions as in the steepest descent algo rithm, the iterates converge to a Karush-Kuhn-Tucker point of P , and the convergence rate is linear, with a convergence constant that is bounded in terms of eigenvalues identically as in the steepest descent algorithm.

4

Solving the Direction-Finding Problem (DFP)

Approach 1 to solving DFP: solving linear equations

Create the system of linear equations: ˜ = −∇f (¯ x) Qd˜ + AT π

(6)

Ad˜ = 0

(7)

7

˜ π) and solve this linear system for (d, ˜ by any method at your disposal.

If Qd˜ = 0, then

˜=0 ∇f (¯ x) + AT π

and so x ¯ is a Karush-Kuhn-Tucker point of P .

If Qd˜ = 0, then rescale the solution as follows: d˜ , d¯ = � d˜T Qd˜ π ¯ = π, ˜ 1 . β¯ = � 2 d˜T Qd˜

¯ π, ¯ deﬁned above satisfy (1), (2), (3), (4), and Proposition 4.1 (d, ¯ β) (5). 8

Note that the rescaling step is not necessary in practice, since we use a line-search.

Approach 2 to solving DFP: Formulas

Let P = [Q−1 − Q−1 AT (AQ−1 AT )−1 AQ−1 ] �

β¯ =

(∇f (¯ x))T P (∇f (¯ x)) 2

x))

π ¯ = −(AQ−1 AT )−1 AQ−1 (∇f (¯ If β¯ > 0, let

−P (∇f (¯

x)) . d¯ = � x))T P (∇f (¯

(∇f (¯ x)) If β¯ = 0, let d¯ = 0.

Proposition 4.2 P is symmetric and positive semi-deﬁnite. Hence β¯ ≥ 0. ¯ π, ¯ deﬁned above satisfy (1), (2), (3), (4), and ¯ β) Proposition 4.3 (d, (5).

9

5 Modiﬁcation of Newton’s Method with Linear Equality Constraints Here we consider the following problem: (P:)

minimizex

s.t.

f (x)

Ax = b.

Just as in the regular version of Newton’s method, we approximate the objective with the quadratic expansion of f (x) at x = x ¯:

˜ :) (P

x) + ∇f (¯ x)T (x − x) ¯ + 12 (x − x) ¯ t H(¯ x)(x − x) ¯ minimizex h(x) := f (¯ s.t.

Ax = b.

Now we solve this problem by applying the KKT conditions, and so we solve the following system for (x, u): Ax

= b

∇h(x) +AT u = 0 .

Now let us substitute the fact that:

x) + H(¯ ∇h(x) = ∇f (¯ x)(x − x) ¯ and A¯

x = b.

10

Substituting this and replacing d = x − x ¯, we have the system: Ad

= 0

x) . H (¯ x)d +AT u = −∇f (¯

The solution (d, u) to this system yields the Newton direction d at x ¯. Notice that there is actually a closed form solution to this system, if we want to pursue this route. It is:

�

d = −H(¯ x)−1 ∇f (¯ x)−1 AT AH (¯ x) + H (¯ x)−1 AT �

x)−1 AT u = − AH (¯

�−1

�−1

AH(¯ x)−1 ∇f (¯ x)

AH(¯ x) . x)−1 ∇f (¯

This leads to the following version of Newton’s method for linearly constrained problems: Newton’s Method for Linearly Constrained Problems: Step 0 Given x0 for which Ax0 = b, set k ← 0 ¯ u): ¯ ¯ ← xk . Solve for (d, Step 1 x Ad¯

= 0

¯ = −∇f (¯ x) . H (¯ x)d¯ +AT u If H(¯ x)d¯ = 0, then stop. Step 2 Choose step-size αk = 1. ¯ k ← k + 1. Go to Step 1. Step 3 Set xk+1 ← x ¯ + αk d, 11

(8)

Note the following: • If H(¯ x)d¯ = 0, then x ¯ is a KKT point. To see this, note from Step 1 T ¯ = 0, which are precisely the KKT conditions for that ∇f (¯ x) + A u this problem. • Equations (8) are called the “normal equations”. They were derived presuming that H(¯ x) is positive-deﬁnite, but can be used even when H (¯ x) is not positive-deﬁnite. • If H (¯ x) is positive deﬁnite, then d¯ is a descent direction. To see this, x)T d¯ = −d¯T H (¯ note that ∇f (¯ x)d¯ < 0 from (8) since H(¯ x) is positive deﬁnite.

6

The Variable Metric Method

In the projected steepest descent algorithm, the direction d must lie in the ellipsoid EQ = {d ∈ n | dT Qd ≤ 1},

where Q is ﬁxed for all iterations. In a variable metric method, Q can vary at every iteration. The variable metric algorithm is:

Step 1. x ¯ satisﬁes A¯ x = b. Compute ∇f (¯ x).

Step 2. Choose a postive-deﬁnite symmetric matrix Q. (Perhaps Q = Q(¯ x), i.e., the choice of Q may depend on the current point.) Solve the directionﬁnding problem (DFP):

12

DFP:

d¯ = arg minimum ∇f (¯ x)T d s.t.

Ad = 0 dT Qd ≤ 1,

¯ is a KKT point. x)T d¯ = 0, stop. In this case, x If ∇f (¯

Step 3. Solve minα f (¯ x + αd¯) for the stepsize α, ¯ perhaps chosen by an exact or inexact linesearch.

Step 4. Set x ¯←x ¯ + αd. ¯ ¯ Go to Step 1.

All properties of the projected steepest descent algorithm still hold here.

Some strategies for choosing Q at each iteration are: • Q=I • Q is a given matrix held contstant over all iterations • Q = H(¯ x) where H (x) is the Hessian of f (x). It is easy to show that that in this case, the variable metric algorithm is equivalent to Newton’s method with a line-search, see the proposition below. • Q = H(¯ x) + δI, where δ is chosent to be large for early iterations, but δ is chosen to be small for later iterations. One can think of this strategy as approximating the projected steepest descent algorithm in early iterations, followed by approximating Newton’s method in later iterations.

13

Proposition 6.1 Suppose that Q = H(¯ x) in the variable metric algorithm. Then the direction d¯ in the variable metric method is a positive scalar times the Newton direction. Proof: If Q = H(¯ x), then the vector d¯ of the variable metric method is the optimal solution of DFP: DFP:

d¯ = arg minimum s.t.

∇f (¯ x)T d Ad = 0 ≤ 1.

dT H(¯ x)d

The Newton direction d˜ for P at the point x ¯ is the solution of the following problem: NDP:

dˆ = arg minimum ∇f (¯ x)T d + 12 dT H (¯ x)d s.t.

Ad = 0.

If you write down the Karush-Kuhn-Tucker conditions for each of these two problems, you then can easily verify that d¯ = γdˆ for some scalar γ > 0. q.e.d.

14