IE 534 Linear Programming Lecture Notes Fall 2011lzwang/LectureNotesOnline.pdfIE 534 Linear...

IE 534 Linear Programming Lecture Notes Fall 2011∗

Lizhi WangIowa State University

1 Introduction

Problem-

Model-

Algorithm-

Solver��

An example of a linear program:

max ζ = 2x1 + 3x2 (1)subject to x1 + 4x2 ≤ 8 (2)

x1 − x2 ≤ 1 (3)x1, x2 ≥ 0. (4)

Here x1 and x2 are variables whose values need to be determined to maximize the linear functionζ = 2x1 + 3x2 subject to the linear constraints (2)-(4).

We can get the optimal solution to LP (1)-(4)graphically as illustrated in Figure 1. There arethree steps:

1. Find the region that satisfies all constraints(shaded region in Figure 1),

2. Find the direction that the objective functionis maximizing towards, and

3. Eyeball the optimal solution (for the examplein Figure 1, x∗1 = 2.4, x∗2 = 1.4, and ζ∗ = 9).

These three steps can only be used to manually findthe optimal solution to linear programs with twovariables. Figure 1: Graphical solution to LP (1)-(4)

∗Copyright ©Lizhi Wang, 2011. All rights reserved. If you have to print this document, please consider double-sided printing to save some trees.

1

In general, a linear program is given as follows:

maxx

ζ = c1x1 + c2x2 + · · ·+ cnxn (5)

s. t. a1,1x1 + a1,2x2 + · · ·+ a1,nxn ≤ b1 (6)...

...ai,1x1 + ai,2x2 + · · ·+ ai,nxn = bi (7)

......

aj,1x1 + aj,2x2 + · · ·+ aj,nxn ≥ bj (8)...

...am,1x1 + am,2x2 + · · ·+ am,nxn ≥ bm (9)

l1 ≤ x1 ≤ u1 (10)l2 ≤ x2 ≤ u2 (11)

...ln ≤ xn ≤ un. (12)

� The variables x1, x2, ..., xn are called decision variables, and a, b, c, l, and u are given param-eters.

� A solution x = [x1, x2, ..., xn]> is a feasible solution if it satisfies all constraints (6)-(12).Similarly, a solution is an infeasible solution if it violates any constraint.

� The set of all feasible solutions is called the feasible region or feasible set.

� If li = −∞ and ui = +∞, then xi is a free or unrestricted variable.

� The linear function (5) is the objective function.

� ζ represents the value of the objective function, called objective value. For a given solutionx, we use ζ(x) to denote the objective value of x: ζ(x) = c1x1 + c2x2 + · · ·+ cnxn.

� The symbol ‘max’ indicates that we want to maximize the objective value ζ. We may alsouse ‘min’ to indicate the opposite.

� The symbol ‘s.t.’ is short for ‘subject to’ or ‘such that’, which starts the list of constraintsthat a feasible solution must satisfy.

� A solution x∗ is optimal if it is feasible and ζ(x∗) ≥ ζ(x′) for any other feasible solution x′.

� The solution of an LP has three possibilities

1. There exists an optimal solution, either uniquely or among infinitely many others.

2. LP is infeasible, which means that a feasible solution does not exist.

3. LP is unbounded, which means that for any real number K, there always exists a feasiblesolution x such that ζ(x) > K. In that case, the optimal objective value is said to be +∞.

2

In many situations, it is convenient to study a linear program in the following standard form:

maxx

ζ = c1x1 + c2x2 + · · ·+ cnxn (13)

s. t. a1,1x1 + a1,2x2 + · · ·+ a1,nxn ≤ b1 (14)...

...am,1x1 + am,2x2 + · · ·+ am,nxn ≤ bm (15)

x1, x2, · · · , xn ≥ 0. (16)

The definition of a standard form is not unique, but may depend on context or personal pref-erence.

The following tricks can be used to transform any linear program to the standard form:

� For a minimization objective

min ζ = c1x1 + c2x2 + · · ·+ cnxn,

we can replace it with

max −ζ = −c1x1 − c2x2 − · · · − cnxn.

� For a greater-than-or-equal-to constraint

aj,1x1 + aj,2x2 + · · ·+ aj,nxn ≥ bj ,

we can replace it with−aj,1x1 − aj,2x2 − · · · − aj,nxn ≤ −bj .

� For an equality constraint

ai,1x1 + ai,2x2 + · · ·+ ai,nxn = bi,

we can replace it with two less-than-or-equal-to constraints

ai,1x1 + ai,2x2 + · · ·+ ai,nxn ≤ bi

−ai,1x1 − ai,2x2 − · · · − ai,nxn ≤ −bi.

� For a variable with both lower and upper bounds

lk ≤ xk ≤ uk, (17)

there are four cases:

1. If lk = −∞ and uk is finite, we can rewrite (17) as xk − uk ≤ 0, and then definex′k = uk − xk. Now, if we substitute xk with uk − x′k, (17) can be replaced by

x′k ≥ 0.

2. If lk is finite and uk = +∞, we can rewrite (17) as 0 ≤ xk − lk, and then definex′k = xk − lk. Now, if we substitute xk with x′k + lk, (17) can be replaced by

x′k ≥ 0.

3

3. If both lk and uk are finite, we can rewrite (17) as 0 ≤ xk− lk ≤ uk− lk, and then definex′k = xk − lk. Now, if we substitute xk with x′k + lk, (17) can be replaced by

x′k ≤ uk − lk

x′k ≥ 0.

4. If lk = −∞ and uk = +∞, we can introduce two variables x+k and x−k . Now, if we

substitute xk with x+k − x

−k , (17) can be replaced by

x+k ≥ 0

x−k ≥ 0.

We often write the standard form LP (13)-(16) in matrix notation:

max ζ = c>x

s. t. Ax ≤ bx ≥ 0,

where

cn×1 =

c1c2...cn

, xn×1 =

x1

x2...xn

, Am×n =

a1,1 a1,2 · · · a1,n

a2,1 a2,2 · · · a2,n...

.... . .

...am,1 am,2 · · · am,n

, bm×1 =

b1b2...bm

.We will use n and m to denote the number of variables and the number of constraints, respectively.

The idea of making decisions to maximize an objective function subject to certain constraintsapplies to more general forms of mathematical programs:

max{f(x) : G(x) ≤ 0m×1, x ∈ X},

where f(·) : Rn 7→ R, G(·) : Rn 7→ Rm, and X is the feasible region.

Linear Programming has a linear objective function f(x) and linear constraints G(x) ≤ 0. Forthe linear program (1)-(4),

f(x) = 2x1 + 3x2,

G(x) =[

1 41 −1

] [x1

x2

]−[

81

],

X = {(x1, x2) : x1 ≥ 0, x2 ≥ 0}.

Nonlinear Programming has a nonlinear objective function f(x) and/or nonlinear constraintsG(x) ≤ 0. For example,

max 2x21 + 3x2 (18)

s. t. x21 + 4x2 ≤ 8 (19)x1 − x2 ≤ 1 (20)x1, x2 ≥ 0. (21)

4

Here,f(x) = 2x2

1 + 3x2,

G(x) =[

1 00 0

] [x2

1

x22

]+[

0 41 −1

] [x1

x2

]−[

81

],

X = {(x1, x2) : x1 ≥ 0, x2 ≥ 0}.

This problem is illustrated in Figure 2.

Figure 2: Graphical solution to nonlinear program (18)-(21)

Integer Programming requires that some or all of the variables be integers. For example,

max 2x1 + 3x2 (22)s. t. x1 + 4x2 ≤ 8 (23)

x1 − x2 ≤ 1 (24)x1, x2 ∈ {0, 1, 2, ...}. (25)

Here,f(x) = 2x1 + 3x2,

G(x) =[

1 41 −1

] [x1

x2

]−[

81

],

X = {(x1, x2) : x1, x2 ∈ {0, 1, 2, ...}}.

This problem is illustrated in Figure 3.

5

Figure 3: Graphical solution to integer program (22)-(25)

2 Computer Solvers

There are numerous LP computer solvers, among which are GLPK (GNU Linear ProgrammingKit) and MATLAB. Below are examples of solving the LP (1)-(4) using GLPK and MATLAB.

2.1 GLPK for linear program

Step 0: Download a compiled version of GLPK for Windows from the course web. A good locationto extract the files to is C:\\User Files. Do NOT extract them to the U: drive.

Step 1: Create a text document named ‘example1.txt’ in the same folder with GLPK, type thefollowing codes, and save the file.var x1 >= 0;var x2 >= 0;maximizezeta: 2 * x1 + 3 * x2;subject toc1: x1 + 4 * x2 <= 8;c2: x1 - x2 <= 1;end;

Step 2: Open the Command Prompt in the folder and type the following command:glpsol --math example1.txt --output example1 output.txtand then press “Enter”. A new file named ‘example1 output.txt’ will be created. This file containsthe optimal solution x∗1 = 2.4, x∗2 = 1.4, and the optimal objective value ζ∗ = 9.

For more information about GLPK, read its manual and/or visit its website:http://www.gnu.org/software/glpk.

2.2 MATLAB for linear program

Step 1: Open MATLAB, create a new file, and type the following codes in the ‘Editor Window’:c = [2, 3]’;

6

A = [1, 4;1, -1];

b = [8, 1]’;Aeq = [];beq = [];lx = [0, 0]’;[x, zeta] = linprog(-c, A, b, Aeq, beq, lx)The default LP formulation that MATLAB assumes is min{c>x : Ax ≤ b, Aeqx = beq, lx ≤ x ≤ ux},so we need to give it the negative c. Save this file somewhere, and give it a name, e.g., ‘example1’.Do NOT use ‘linprog’ or ‘quadprog’ as the name, because they are reserved for MATLAB systemfunctions.

Step 2: Press ‘F5’ or hit the run button . The result will be given in the ‘Command Window’:Optimization terminated.x =2.40001.4000zeta =-9.0000

The optimal solution here is the same as given by GLPK, but the optimal objective value has anopposite sign since MATLAB solves a minimization problem.

For more information about MATLAB, read its help document and/or visit its website:http://www.mathworks.com.

2.3 MATLAB for quadratic program

MATLAB has a function quadprog that solves quadratic programs with linear constraints, forexample

min 0.5x21 + x2

2 − x1x2 − 2x1 − 6x2

s. t. x1 + x2 ≤ 2−x1 + 2x2 ≤ 22x1 + x2 ≤ 3x1, x2 ≥ 0.

This QP can be equivalently rewritten in the following matrix form:

min{c>x+

12x>Qx : Ax ≤ b, x ≥ 0

},

where

c =[−2−6

], Q =

[1 −1−1 2

], A =

1 1−1 22 1

, b =

223

.Here the matrix Q can be obtained as Qi,j = ∂2f(x)

∂xi∂xj, in which f(x) denotes the objective function.

The default QP formulation that MATLAB assumes is min{c>x + 0.5x>Qx : Ax ≤ b, Aeqx =beq, lx ≤ x ≤ ux}. So, we can use the following MATLAB codes to solve the problem:

7

A = [1, 1; -1, 2; 2, 1];b = [2, 2, 3]’;c = [-2, -6]’;Aeq = [];beq = [];Q = [1, -1; -1, 2];lx = zeros(2,1);[x, zeta] = quadprog(Q, c, A, b, Aeq, beq, lx)

The optimal solution is x∗1 = 0.67, x∗2 = 1.33, and the optimal objective value is ζ∗ = −8.22.

2.4 GLPK for mixed integer linear program

Integer programs that contain both continuous and integer variables are called mixed integer pro-grams. GLPK is able to solve mixed integer linear programs. For example, to solve

min x1 − 2x2 − 3x3 (26)s. t. x1 + 7x2 − 8x3 ≤ 12 (27)

−x1 + x2 + 3x3 ≥ 1 (28)5x2 + 9x3 = 13 (29)

x1 ∈ {0, 1}, x2 ∈ {0, 1, 2, 3}, x3 ≥ 0, (30)

we can use the following GLPK codes:var x1 binary;var x2 integer, >=0, <=3;var x3 >=0;minimizezeta: x1 - 2 * x2 - 3 * x3;subject toc1: x1 + 7 * x2 - 8 * x3 <= 12;c2: -x1 + x2 + 3 * x3 >= 1;c3: 5 * x2 + 9 * x3 = 13;end;

The optimal solution is x∗1 = 0, x∗2 = 2, x∗3 = 0.33, and the optimal objective value is ζ∗ = −5.

2.5 GLPKMEX for mixed integer linear program

GLPKMEX is a MATLAB interface to GLPK, which enables one to use GLPK through MATLABcodes. A copy of GLPKMEX can be downloaded from the course web. With MATLAB’s currentdirectory being where GLPKMEX is saved, the following MATLAB codes can be used to solve theinteger program (26)-(30):c = [1, -2, -3]’;A = [1, 7, -8;-1, 1, 3;0, 5, 9];b = [12, 1, 13]’;lx = [0, 0, 0]’;ux = [1, 3, inf]’;ctype = [’U’, ’L’, ’S’];

8

vartype = [’B’, ’I’, ’C’];s = 1;[x, zeta] = glpk(c, A, b, lx, ux, ctype, vartype, s)

For more information about GLPKMEX, read the file glpk.m and/or visit its website:http://glpkmex.sourceforge.net.

3 Modeling

Mathematical programming roots from real life problems where people always have some objectivesto maximize or minimize, but these objectives are kept from going to infinity by various constraints.An important goal of this course is to develop the skills of using mathematical programming (es-pecially linear programming) to formulate, solve, and analyze real life problems. Three examplesare given below.

3.1 Arbitrage in currency exchange

In currency markets, arbitrage is trading among different currencies in order to profit from a ratediscrepancy. For example, if the exchange rates between dollar and euro are 1 dollar ⇒ 0.7 euroand 1 euro ⇒ 1.5 dollar, then a profit can be made by trading from dollar to euro and then backto dollar. Obviously, arbitrage opportunities are not supposed to exist, at least in theory. Thefollowing are actual trades made on February 14 of 2002 with minor modification. How to use anLP model to identify if an arbitrage opportunity exists in this example? Notice that an arbitragemay involve more than two currencies, e.g., Dollar ⇒ Yen ⇒ Pound ⇒ Euro ⇒ Dollar.

from \ to Dollar Euro Pound YenDollar 1 1.1468 0.7003 133.30Euro 0.8716 1 0.6097 116.12

Pound 1.4279 1.6401 1 190.45Yen 0.00750 0.00861 0.00525 1

9

3.2 Turning junk into treasure

A company’s large machine broke down. Instead ofbuying a new one, which is very expensive, they findthat there are some old and broken machines of thesame type in the junk yard, so they decide to reassem-ble a working machine out of the broken ones. Thistype of machine has ten different components, con-nected one after another like a train as illustrated inFigure 4. Components at the same position from dif-ferent machines are interchangeable but componentsat different positions are not. A ‘0’ means that thecomponent is broken and a ‘1’ means that the com-ponent is still good. They plan to reassemble thegood components to make a complete working ma-chine. The numbers between components representthe disassembling and reassembling costs. We assumethat these costs are the same for all old machines. Forexample, they can make a working machine from (a)and (b), which requires three assembling points at(#2,#3), (#4,#5), and (#8,#9), with a total cost of5 + 11 + 13 = 29.

ttttttttt

ttttttttt

ttttttttt

ttttttttt

(a)

13

15

04

011

19

17

110

113

06

0

(b)

0

1

1

1

0

1

0

0

1

1

(c)

0

0

0

0

1

1

1

1

1

1

(d)

1

1

1

1

0

0

0

0

0

0

Figure 4: Four examples of old machines

Suppose that there are 20 old machines in their junk yard, and the functioning status of theircomponents is given in matrix A, in which each column represents an old machine. The assemblingcosts are given in vector C, with Ci representing the disassembling and reassembling cost betweenAi,j and Ai+1,j for all j. Build a linear programming or integer programming model to find theleast costly way to make a complete working machine using these old ones. Which old machinesshould be used to make it? What is the least possible cost?

A =

0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 10 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 01 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 00 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 10 1 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 11 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 10 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 01 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 00 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 10 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0

, C =

354119710136

.

3.3 Old McDonald had a farm

A farmer can grow wheat, corn, and beans on his 500-acre farm. The planting costs are $150/acre,$230/acre, and $260/acre, and expected yields are 2.5 tons/acre, 3 tons/acre, and 20 tons/acre,respectively. He needs 200 tons of wheat and 240 tons of corn to feed his cattle. The farmer cansell wheat and corn to a wholesaler at $170/ton and $150/ton. He can also buy wheat and corn

10

from the wholesaler at $238/ton and $210/ton. Beans sell at $36/ton for the first 6000 tons. Dueto economic quotas on bean production, beans in excess of 6000 tons can only be sold at $10/ton.How shall the farmer allocate his 500 acres to maximize his profit? Define

xW , xC , xB: acres of wheat, corn, and beans plantedwW , wC , wB: tons of wheat, corn, and beans sold at favorable price

eB: tons of beans sold at lower priceyW , yC : tons of wheat and corn purchased,

then the profit maximization problem can be formulated as:

max ζ = −150xW − 230xC − 260xB − 238yW + 170wW − 210yC + 150wC + 36wB + 10eBs. t. xW + xC + xB ≤ 500

2.5xW + yW − wW = 2003xC + yC − wC = 24020xB − wB − eB = 0

wB ≤ 6000xW , xC , xB, wW , wC , wB, eB, yW , yC ≥ 0.

The optimal solution of this problem is x∗W = 120, x∗C = 80, x∗B = 300, w∗W = 100, w∗C = 0, w∗B =6000, e∗B = 0, y∗W = 0, y∗C = 0, ζ∗ = $118, 600.

While this solution makes sense, it assumes that the yields (2.5 tons/acre, 3 tons/acre, and 20tons/acre) are known. However, yields are greatly dependent on the weather and could increase by20% in a good year and decrease by 20% in a bad one.

If we know in advance that next year will be a good year and the yields become 3 tons/acre,3.6 tons/acre, and 24 tons/acre, then the optimal solution becomes x∗W = 183.33, x∗C = 66.67, x∗B =250, w∗W = 350, w∗C = 0, w∗B = 6000, e∗B = 0, y∗W = 0, y∗C = 0, ζ∗ = $167, 667.

If we know in advance that next year will be a bad year and the yields become 2 tons/acre,2.4 tons/acre, and 16 tons/acre, then the optimal solution becomes x∗W = 100, x∗C = 25, x∗B =375, w∗W = 0, w∗C = 0, w∗B = 6000, e∗B = 0, y∗W = 0, y∗C = 180, ζ∗ = $59, 950.

However, without knowing the weather of next year in advance, how can we formulate an LPmodel to find the optimal solution that maximizes the expected profit? Suppose the probabilitiesthat next year will be a good, average, or bad year are 0.35, 0.4, and 0.25, respectively. Noticethat the farm allocation decisions (xW , xC , xB) must be made at the beginning of the year withoutknowing the weather, and (yW , wW , yC , wC , wB, eB) can be made after observing the weather.Therefore we can have three recourse solutions of the latter group: (yG

W , wGW , yG

C , wGC , w

GB , e

GB) for a

good year, (yAW , wA

W , yAC , w

AC , w

AB, e

AB) for an average year, and (yB

W , wBW , yB

C , wBC , w

BB , e

BB) for a bad

year. The expected profit can be then defined as:

E[ζ] = −150xW − 230xC − 260xB

+0.35× (−238yGW + 170wG

W − 210yGC + 150wG

C + 36wGB + 10eGB)

+0.4× (−238yAW + 170wA

W − 210yAC + 150wA

C + 36wAB + 10eAB)

+0.25× (−238yBW + 170wB

W − 210yBC + 150wB

C + 36wBB + 10eBB).

11

3.4 Minimizing a convex piecewise linear objective function

A function f : Rn 7→ R is called convex if for all x, y ∈ Rn and all λ ∈ [0, 1], we have

f [λx+ (1− λ)y] ≤ λf(x) + (1− λ)f(y).

A function f : Rn 7→ R is called concave if for all x, y ∈ Rn and all λ ∈ [0, 1], we have

f [λx+ (1− λ)y] ≥ λf(x) + (1− λ)f(y).

Some properties about convex and concave functions:

1. If f is a convex function, then −f is a concave function

2. If f1, f2, ..., fn are all convex functions, then max{f1, f2, ..., fn} is a convex function

3. If f1, f2, ..., fn are all concave functions, then min{f1, f2, ..., fn} is a concave function

It is much easier to find the minimum of a convex function than a concave one. Similarly, it ismuch easier to find the maximum of a concave function than a convex one. Consider the followingproblem:

min f(x1, x2) (31)s. t. x1 + 4x2 ≤ 8 (32)

x1 − x2 ≤ 1 (33)x1, x2 ≥ 0, (34)

where the objective function f(x1, x2) is a convex piecewise linear function:

f(x1, x2) = max{x1 − 3x2,−4x1 − x2,−2x1 + 5x2}.

To reformulate problem (31)-(34) as a linear program, we use the fact that max{x1−3x2,−4x1−x2,−2x1 + 5x2} is equal to the smallest number z that satisfies z ≥ x1 − 3x2, z ≥ −4x1 − x2 andz ≥ −2x1 + 5x2. Therefore, problem (31)-(34) is equivalent to the following linear program:

min z

s. t. x1 + 4x2 ≤ 8x1 − x2 ≤ 1

z ≥ x1 − 3x2

z ≥ −4x1 − x2

z ≥ −2x1 + 5x2

x1, x2 ≥ 0; z free.

Challenge 1: Can you reformulate the problem if the objective function is redefined as

f(x1, x2) = min{x1 − 3x2,−4x1 − x2,−2x1 + 5x2}?

12

3.5 Dealing with binary decision variables

Consider the following nonlinear nonconvex binary program:

max x1 − 2x2 + 5x3 − x21 + 5x2

2 + 17x23 − 7x1x2 − 4x1x3 + 11x2x3 (35)

s. t. x1 + 4x2 − 2x3 ≤ 4 (36)4x1x2 + 3x2

3 − 2x1x3 − x2 ≤ 3 (37){x1, x2, x3} 6= {1, 0, 1} (38)x1, x2, x3 binary. (39)

This problem can be reformulated as an LP with binary decision variables by using the followingtricks.

� Replace squared terms x21, x2

2, and x23 with x1, x2, and x3, respectively.

� For other quadratic terms such as x1x3, replace it with a new binary variable x13 and addtwo new constraints 2x13 ≤ x1 + x3 ≤ 1 + x13.

� Replace constraint (38) with a new constraint (1− x1) + x2 + (1− x3) ≥ 1.

The resulting model is the following LP with binary decision variables:

max x1 − 2x2 + 5x3 − x1 + 5x2 + 17x3 − 7x12 − 4x13 + 11x23 (40)s. t. x1 + 4x2 − 2x3 ≤ 4 (41)

4x12 + 3x3 − 2x13 − x2 ≤ 3 (42)2x12 ≤ x1 + x2 ≤ 1 + x12 (43)2x13 ≤ x1 + x3 ≤ 1 + x13 (44)2x23 ≤ x2 + x3 ≤ 1 + x23 (45)

(1− x1) + x2 + (1− x3) ≥ 1 (46)x1, x2, x3, x12, x13, x23 binary. (47)

Challenge 2: The formulation (40)-(47) can be further simplified because some constraints areredundant. Without actually solving for the optimal solution, can you identify which constraintscan be removed and still guarantee the model will give the correct optimal solution?

Challenge 3: If we have an “x1x2x3” term in the objective function, how can you linearize thisnonlinear term by introducing new variables and constraints?

3.6 At least k out of m constraints must hold

The following problem does not require all constraints to hold, but at least k out of m must do:

max c>x

s. t.

a1x ≤ b1a2x ≤ b2

...amx ≤ bm

at least k out of m constraints must hold

x ≥ 0.

13

We can reformulate this problem as a mixed integer program by introducing some binaryvariables y’s and a constant M whose value is set to be sufficiently large so that the constraintaix ≤ bi +M can be ignored because it will never get violated:

max c>x

s. t. a1x ≤ b1 +M(1− y1)a2x ≤ b2 +M(1− y2)

...amx ≤ bm +M(1− ym)y1 + y2 + ...+ ym ≥ k

x ≥ 0, y binary.

Challenge 4: Can you reformulate the problem if the “at least k out of m constraints must hold”requirement is changed to “exactly k out of m constraints must hold”?

4 Optimality Conditions

Different algorithms may vary in details, but most of them contain the same basic steps:

Step 1: Find a feasible solution to start from.Step 2: Check to see if the current solution is optimal.Step 3: If the solution is optimal then stop. Otherwise find a better solution and go to Step 2.

In order for an algorithm to be thorough, it also needs to be able to identify infeasible or un-bounded problems.

We first introduce KKT (Karush-Kuhn-Tucker) conditions, which are widely used for checkingthe optimality of a solution to mathematical programs. We consider a general form nonlinearprogram

max{f(x) : G(x) ≥ 0m×1}, (48)

where f(·) : Rn 7→ R and G(·) : Rn 7→ Rm. The KKT conditions are: there exists a vector y ∈ Rm

such that the following constraints hold:

∇f(x) +m∑

i=1

yi∇Gi(x) = 0n×1 (49)

Gi(x) ≥ 0, ∀i = 1, ...,m (50)yi ≥ 0,∀i = 1, ...,m (51)yiGi(x) = 0, ∀i = 1, ...,m. (52)

To interpret KKT conditions from a more intuitive perspective, let’s look at the LP (1)-(4)again and use it as an example to see why the KKT conditions (49)-(52) make sense. If we rewrite

LP (1)-(4) in the form of (48), then f(x) = 2x1 + 3x2, G(x) =

G1(x)G2(x)G3(x)G4(x)

=

8− x1 − 4x2

1− x1 + x2

x1

x2

,

m = 4, and n = 2.

14

Figure 4: Figure 1 from a different angle

We look at Figure 1 from a different angle as if the constraints are walls and the objectivefunction is towards the direction of gravity, as is shown in Figure 4. Now, what will happen ifwe put a small pingpong ball inside the two-dimensional feasible region and let it go? Since thisenvironment is exactly analogous to LP (1)-(4), we can imagine that the ball will end up nowherebut the optimal solution to the LP.

If you still remember physics, in order for the ball to stop moving, all forces that apply to itmust cancel out. The possible forces are: gravity in the direction of ∇f(x), and a force from theconstraint wall in the direction of ∇Gi(x) for all i = 1, ...,m. The magnitude of gravity dependson the mass of the ball, and let’s assume it’s 1; for i = 1, ...,m, let the magnitude of the force fromwall i be denoted by yi. Now, from the physics perspective, the conditions for the ball to stop are:

All forces cancel out: ∇f(x) +m∑

i=1

yi∇Gi(x) = 0n×1 (53)

Ball must stay inside the walls: Gi(x) ≥ 0, ∀i = 1, ...,m (54)Force magnitudes are nonnegative: yi ≥ 0,∀i = 1, ...,m (55)

If the ball does not touch a wall,the force from that wall is zero: yiGi(x) = 0, ∀i = 1, ...,m. (56)

Notice that conditions (53)-(56) are exactly the same with the KKT conditions (49)-(52), which isan interpretation of the KKT conditions from the physics perspective.

KKT conditions (49)-(52) can be written as

∇f(x) +m∑

i=1

yi∇Gi(x) = 0n×1 (57)

0 ≤ G(x) ⊥ y ≥ 0. (58)

15

Here ‘⊥’ indicates the complementarity of two vectors. If vectors a ∈ Rn and b ∈ Rn are of thesame dimension, then 0 ≤ a ⊥ b ≥ 0 means that: (i) a ≥ 0, (ii) b ≥ 0, and (iii) a>b = 0. Twovectors that satisfy condition (iii) are said to be complementary or perpendicular to each other.

In general, for nonlinear programs, KKT conditions may not be necessary or sufficient optimalityconditions. For example, solution (x1 = 1, x2 = 0) does not satisfy KKT conditions, but it is optimalto

max x1

s. t. x2 + (x1 − 1)3 ≤ 0x2 ≥ 0.

Solution (x1 = 0.5, x2 = 0.25) satisfies KKT conditions, but it is not optimal to

min x2

s. t. x21 − x1 + x2 ≥ 0

0 ≤ x1 ≤ 2.

For linear programs, KKT are both necessary and sufficient conditions of optimality, whichmeans that a solution to an LP is optimal if and only if it satisfies KKT conditions. We now derivethe KKT conditions for the standard form linear program

max{c>x : Ax ≤ b, x ≥ 0}, (59)

which can be written in the form of (48) as

max{f(x) = c>x : G(x) =

[−Am×n

In×n

]x+

[bm×1

0n×1

]≥ 0(m+n)×1

},

then the KKT conditions for a standard linear program becomes: there exist two vectors y ∈ Rm

and µ ∈ Rn, corresponding to Ax ≤ b and x ≥ 0, respectively, such that the following constraintsare satisfied:

c+[−A> I

] [ yµ

]= 0 (60)

0 ≤[b−Axx

]⊥[yµ

]≥ 0. (61)

Equation (60) is equivalent to c−A>y + µ = 0 or µ = A>y− c, with which we can substitute µ in(61). Now, (60)-(61) are simplified to:

0 ≤[b−Axx

]⊥[

yA>y − c

]≥ 0. (62)

Condition (62) is necessary and sufficient for the optimality of the standard form LP (59), thus itis also called the optimality condition.

As an exercise, let’s prove the optimality of the solution (x∗1 = 2.4, x∗2 = 1.4) to LP (1)-(4)using KKT conditions. We plug in the numbers of x∗, f(x∗) and G(x∗), then KKT conditions (62)

16

become: there exists a vector y such that the following constraints hold:0000

≤

8− (2.4 + 4× 1.4)1− (2.4− 1.4)

2.41.4

⊥

y1

y2

y1 + y2 − 24y1 − y2 − 3

≥

0000

.It is not hard to find that y∗ =

[1 1

]> satisfies the above condition, which confirms the opti-mality of (x∗1 = 2.4, x∗2 = 1.4) to LP (1)-(4).

As another exercise, let’s find the optimality condition of a non-standard form LP:

min{b>µ : A>µ ≥ c, µ ≥ 0}. (63)

LP (63) can be equivalently rewritten in the standard form as follows:

max{−b>µ : −A>µ ≤ −c, µ ≥ 0}. (64)

Let’s relate (µ,−A>,−c,−b) in (64) to (x,A, b, c) in (59), respectively. We also introduce a newvariable λ to relate to y in (62). Then the optimality condition for (63) and (64) is:

0 ≤[

λ−Aλ+ b

]⊥[−c+A>µ

µ

]≥ 0. (65)

It is interesting to notice that optimality conditions (62) and (65) are equivalent to each other inthe sense that if (x, y) is a feasible solution to (62) then (µ = y, λ = x) is also a feasible solutionto (65), and vice versa. For this reason, we can say that LPs (59) and (63) actually share a sameoptimality condition if we simply change the notation µ to y in (63):

min{b>y : A>y ≥ c, y ≥ 0}. (66)

Once we find a pair of feasible solutions (x, y) to the optimality condition (62), then x and y arethe optimal solutions to LPs (59) and (66), respectively.

In the optimality condition (62):

0 ≤[b−Axx

]⊥[

yA>y − c

]≥ 0,

the left hand side is simply saying that x must be feasible to (59), and the right hand side is sayingthat y must be feasible to (66). These two problems both have many feasible solutions, but whatmakes the optimal solutions (x∗, y∗) optimal is the fact that they also satisfy the complementarityconditions: 0 ≤ b−Ax∗ ⊥ y∗ ≥ 0 and 0 ≤ x∗ ⊥ A>y∗ − c ≥ 0.

5 Duality

We are ready to introduce perhaps the most important concept in linear programming: duality.Two linear programs, like (59) and (66), that share a same optimality condition in a complementarymanner are called dual to each other: the decision variable y in (66) is the “magnitude of force” ofthe constraint wall for (59), and vice versa. Either one of the LPs can be called the primal problem

17

and the other is the dual problem. Decision variables of the primal (dual) problem are called primal(dual) variables.

Every linear program has a dual problem, which can be found by reformulating the primalproblem in the same form as either (59) or (66), and then simplifying its corresponding dualproblem. For example, to find the dual problem to

min{c>x : Ax ≤ b}, (67)

we first introduce two non-negative variables x+ and x− to replace the free variable x with x =x+ − x−, and then rewrite (67) as the same form with (66):

min{[

c> −c>] [ x+

x−

]:[−A A

] [ x+

x−

]≥ −b,

[x+

x−

]≥ 0}, (68)

whose dual problem is clearly

max{−b>y :

[−A>A>

]y ≤

[c−c

], y ≥ 0

}, (69)

which can be simplified asmax

{b>y : A>y = c, y ≤ 0

}. (70)

Now, we have found that the dual to (67) is (70).

The following table summarizes the primal-dual relations for general form LPs:

max c1x1 + c2x2 + c3x3 ↔ b1y1 + b2y2 + b3y3 min

constraints

a1,1x1 + a1,2x2 + a1,3x3 ≤ b1a2,1x1 + a2,2x2 + a2,3x3 = b2a3,1x1 + a3,2x2 + a3,3x3 ≥ b3

↔↔↔

y1 ≥ 0y2 freey3 ≤ 0

variables

variables

x1 ≥ 0x2 ≤ 0x3 free

↔↔↔

a1,1y1 + a2,1y2 + a3,1y3 ≥ c1a1,2y1 + a2,2y2 + a3,2y3 ≤ c2a1,3y1 + a2,3y2 + a3,3y3 = c3

constraints

We can use this table to find the dual of a general form LP directly. For example,

(P21): min x1 + 2x2 + 3x3

s. t. x1 − 3x2 = 52x1 − x2 + 3x3 ≥ 6

x3 ≤ 4x1 ≥ 0x2 ≤ 0x3 free.

↔↔↔↔↔↔↔

(D21): max 5y1 + 6y2 + 4y3

s. t. y1 freey2 ≥ 0y3 ≤ 0y1 + 2y2 ≤ 1−3y1 − y2 ≥ 23y2 + y3 = 3.

This table can also be used to check the optimality of a solution to a non-standard form LP. If(i) x satisfies the constraints of the primal problem, (ii) there exists a y that satisfies the constraintsof the dual, and (iii) the corresponding constraints and variables are complementary to each other,then (x, y) is optimal. For example, we can check the optimality of (x∗1 = 5, x∗2 = 0, x∗3 = −4/3)

18

(y∗1 = −1, y∗2 = 1, y∗3 = 0) by observing that (i) x∗ is primal feasible, (ii) y∗ is dual feasible, and (iii)(x∗1 − 3x∗2 − 5)y∗1 = (2x∗1 − x∗2 + 3x∗3 − 6)y∗2 = (x∗3 − 4)y∗3 = x∗1(y∗1 + 2y∗2 − 1) = x∗2(−3y∗1 − y∗2 − 2) =x∗3(3y∗2 + y∗3 − 3) = 0.

The following are some important theorems on duality.

Complementary Slackness: Solutions x and y are optimal to max{c>x : Ax ≤ b, x ≥ 0} andmin{b>y : A>y ≥ c, y ≥ 0}, respectively, if and only if condition (62) is met:

0 ≤[b−Axx

]⊥[

yA>y − c

]≥ 0.

Weak Duality: If x and y are feasible solutions to max{c>x : Ax ≤ b, x ≥ 0} and min{b>y :A>y ≥ c, y ≥ 0}, respectively, then c>x ≤ b>y.

Proof. Since x and y are feasible, we have

0 ≤ (b−Ax)>y = b>y − (Ax)>y = b>y − y>Ax,

and0 ≤ (A>y − c)>x = y>Ax− c>x.

Therefore, c>x ≤ y>Ax ≤ b>y.

Strong Duality: If x and y are optimal solutions to max{c>x : Ax ≤ b, x ≥ 0} and min{b>y :A>y ≥ c, y ≥ 0}, respectively, then c>x = b>y.

Proof. Since x and y are optimal, by complementary slackness, we have

0 = (b−Ax)>y = b>y − (Ax)>y = b>y − y>Ax,

and0 = (A>y − c)>x = y>Ax− c>x.

Therefore, c>x = y>Ax = b>y.

Primal-Dual Possibility Table: Recall that a linear program has three possibilities: finitelyoptimal, infeasible, or unbounded. The primal-dual pair has nine combinations, but only four ofthem are possible.

Primal \ Dual Finitely optimal Unbounded InfeasibleFinitely optimal Possible Impossible Impossible

Unbounded Impossible Impossible PossibleInfeasible Impossible Possible Possible

Two examples of “primal unbounded and dual infeasible”, which are also “primal infeasible anddual unbounded”:

max x1 + x2

s. t. x1 ≤ 1x1, x2 ≥ 0.

↔

min ys. t. y ≥ 1

0y ≥ 1y ≥ 0.

19

max 3x1 + 4x2

s. t. x1 + x2 ≤ 12x1 − x2 ≤ −3x1, x2 ≥ 0.

↔

min y1 − 3y2

s. t. y1 + 2y2 ≥ 3y1 − y2 ≥ 4y1, y2 ≥ 0.

Two examples of “both primal and dual are infeasible”:

min x1 + 2x2

s. t. x1 + x2 = 1x1 + x2 = 2.

↔

max y1 + 2y2

s. t. y1 + y2 = 1y1 + y2 = 2.

max 2x1 − x2

s. t. x1 − x2 ≤ 1−x1 + x2 ≤ −2x1, x2 ≥ 0.

↔

min y1 − 2y2

s. t. y1 − y2 ≥ 2−y1 + y2 ≥ −1y1, y2 ≥ 0.

For an LP to be unbounded, there must exist two things: (i) a feasible solution x0, satisfyingAx0 ≤ b, x0 ≥ 0, and (ii) a direction ∆x, which leads the objective value c>(x0+λ∆x) towards infin-ity without violating any constraints as the step size λ approaches infinity. This direction is calledan extreme ray. If ∆x satisfies c>∆x > 0, A∆x ≤ 0,∆x ≥ 0, then ∆x is called an extreme ray tothe LP maxx{c>x : Ax ≤ b, x ≥ 0}. In the dual space, if ∆y satisfies b>∆y < 0, A>∆y ≥ 0,∆y ≥ 0,then ∆y is called an extreme ray to the LP miny{b>y : A>y ≥ c, y ≥ 0}.

Farkas’ Lemma: Let A ∈ Rm×n and b ∈ Rm×1 be a matrix and a vector, respectively. Thenexactly one of the following two alternatives holds:(a) There exists some x ≥ 0 such that Ax ≤ b.(b) There exists some y ≥ 0 such that A>y ≥ 0 and b>y < 0.

Proof. (a) true ⇒ (b) false: If (a) is true, then for any y ≥ 0 such that A>y ≥ 0, we haveb>y ≥ (Ax)>y = x>A>y ≥ 0, which means that (b) is false.

(a) false ⇒ (b) true: Consider max{0 : Ax ≤ b, x ≥ 0} and its dual min{b>y : A>y ≥ 0, y ≥ 0}.If (a) is false, then max{0 : Ax ≤ b, x ≥ 0} is infeasible. Its dual is either unbounded or infeasible.It is easy to see that y = 0 is a feasible solution to the dual, so it must be unbounded, which meansthat there must exist some y ≥ 0 such that A>y ≥ 0 and b>y < 0. Therefore, (b) is true.

A Variation of Farkas’ Lemma: Let A ∈ Rm×n and b ∈ Rm×1 be a matrix and a vector,respectively. Then exactly one of the following two alternatives holds:(a) There exists some x ≥ 0 such that Ax = b.(b) There exists some y such that A>y ≥ 0 and b>y < 0.

Proof. (a) true ⇒ (b) false: If (a) is true, then for any y such that A>y ≥ 0, we have b>y =(Ax)>y = x>A>y ≥ 0, which means that (b) is false.

(a) false ⇒ (b) true: Consider max{0 : Ax = b, x ≥ 0} and its dual min{b>y : A>y ≥ 0}. If (a)is false, then max{0 : Ax = b, x ≥ 0} is infeasible. According to the Primal-Dual Possibility Table,its dual is either unbounded or infeasible. It is easy to see that y = 0 is a feasible solution to thedual, so it must be unbounded, which means that there must exist some y such that A>y ≥ 0 andb>y < 0. Therefore, (b) is true.

20

Extended Primal-Dual Possibility Table: Farkas Lemma enables us to define the primal-dual possibility table in a more revealing manner.

We use x0 or y0 to indicate that a feasible solution exists, and ∆x or ∆y to indicate that anextreme ray does not exist. Farkas Lemma basically says that x0 ↔ ∆y, y0 ↔ ∆x, x0 ↔ ∆y, andy0 ↔ ∆x. Therefore, we have the following extended primal-dual possibility table:

primal dualoptimal x0 + ∆x ↔ ∆y + y0 optimal

unbounded x0 + ∆x ↔ ∆y + y0 infeasibleinfeasible x0 + ∆x ↔ ∆y + y0 infeasibleinfeasible x0 + ∆x ↔ ∆y + y0 unbounded

6 Simplex Algorithm

Now that we know what condition a solution needs to satisfy to be an optimal solution, but howdo we come up with such an optimal solution? In this section, we will learn about an algorithmcalled simplex that finds an optimal solution if the LP has one, or determines that the LP is infea-sible or unbounded if that is the case. Simplex is perhaps the most important linear programmingalgorithm, and we are going to learn about it in great detail.

The basic idea of simplex is based on the observation that the optimal solution to an LP, ifit exists, occurs at a corner of the feasible region. This can be verified with the LP examples wehave seen in the lecture notes or homework examples. Based on this observation, we can find theoptimal solution by (i) starting from a feasible corner point, and (ii) moving to a better cornerpoint until the current one is already optimal. If we cannot find a starting point, then the LP isinfeasible; if we can optimize the objective value to infinity, then the LP is unbounded.

While the idea may sound simple and intuitive, we need to rigorously establish its theoreticalcorrectness.

6.1 What exactly is a “corner point”?

“Corner point” is a nickname of a well-known concept called “basic solution”. So let’s use basicsolution instead of corner point from now on. A solution x0 is a basic solution if it is uniquelydetermined by its active constraints at equality. An inequality constraint (Ax)i ≤ bi is active at x0

if it holds at equality: (Ax0)i = bi. An equality constraint (Ax)i = bi is also considered active aslong as the equality holds.

To explain the definition of a basic solution, let us suppose the feasible region is defined byAx ≤ b, which already includes any non-negativity constraints x ≥ 0. For any solution x0, letI(x0) be the set of its active constraints: I(x0) = {i : (Ax0)i = bi}. By definition, x0 satisfiesthese constraints at equality: (Ax)I(x0) = bI(x0). If x0 is the only solution determined by thisequation, then it is a basic solution. However, if there exists another solution x1 that also satisfies(Ax1)I(x0) = bI(x0), then x0 is not a basic solution.

Recall from linear algebra that a necessary condition for a linear system of equations Ax = bto have a unique solution x ∈ Rn is that, there exist n linearly independent rows in matrix A.

21

Given a finite number of vectors V1, V2, ..., VK ∈ Rn, we say that they are linearly dependent ifthere exist real numbers a1, a2, ..., aK such that

∑Ki=1 |ai| > 0 and

∑Ki=1 aiVi = 0. Otherwise, they

are called linearly independent.

As an example, we look at LP (1)-(4) and Figure 1 again:

(1) : max 2x1 + 3x2

(2) : s. t. x1 + 4x2 ≤ 8(3) : x1 − x2 ≤ 1(4) : x1, x2 ≥ 0.

We know intuitively that (x1 = 1, x2 = 0) is a basic solution (because it is a corner point). Nowlet’s check with the definition. There are two active constraints at this point: x1 − x2 ≤ 1 andx2 ≥ 0, and (x1 = 1, x2 = 0) is uniquely determined by these two constraints at equality, therefore,it is a basic solution. On the other hand, (x1 = 2, x2 = 1.5) is not a basic solution, because thereis only one active constraint at that point: x1 + 4x2 ≤ 8, and (x1 = 2, x2 = 1.5) is obviously notthe only solution determined by that constraint at equality.

Also notice that a basic solution is not required by definition to be feasible. Point (x1 = 8, x2 =0) is not a feasible solution, but it is a basic solution because it is uniquely determined by twoactive constraints at equality: x1 + 4x2 ≤ 8 and x2 ≥ 0.

6.2 Is the optimal solution always a basic solution?

Unfortunately, the answer is no. First, some LPs may not even have a basic solution. For examplemax{0 : x1, x2 free}, where we are maximizing the constant 0 over the entire two-dimensional spacewith no constraints, and there is no basic solution. Secondly, even if an LP has basic solutions,there may also exist an optimal solution that is not a basic solution. For example (x1 = 1, x2 = 0)is an optimal solution to min{x2 : 0 ≤ x1 ≤ 2, x2 ≥ 0}, but it is not a basic solution.

With that being said, there is some good news that still validates the idea of simplex: (i) If wewrite an LP in the standard form max{c>x : Ax ≤ b, x ≥ 0}, there always exists a basic solution.As a matter of fact, the origin point (x = 0) is uniquely determined by the active constraints x ≥ 0at equality, thus it is a basic solution. (ii) Suppose an LP has at least one basic solution and oneoptimal solution. It can be proved that: if the optimal solution is unique, then it must be a basicsolution; if there are infinitely many optimal solutions, then there exists one that is a basic solution.

22

6.3 How to find basic solutions?

In the simplex context, it is oftentimes more convenient to consider a non-standard form LP

max{ζ = c>x : Ax+ w = b, x ≥ 0, w ≥ 0}, (71)

which is equivalently reformulated from the standard form LP max{c>x : Ax ≤ b, x ≥ 0} by simplyintroducing a new variable w ∈ Rm, called slack variable, to make the inequality constraint Ax ≤ ban equality one Ax+ w = b.

We now write (71) in the following matrix form:

max ζ = c>x (72)s. t. Ax = b (73)

x ≥ 0, (74)

where c =[

c0m×1

], A =

[A Im×m

], and x =

[xw

]∈ Rn+m.

Since the dimension of x is (n+m)× 1, we need n+m linearly independent active constraintsto uniquely determine a basic solution. We already have m rows in Ax = b, so we need at least nconstraints in x ≥ 0 to hold at equality.

Define N as the indices of constraints in x ≥ 0 that are set to hold at equality, and B as the in-dices of other constraints in x ≥ 0. Such a pair of (B,N ) is an exclusive and exhaustive partition ofthe set {1, 2, ..., n+m}. For example, if we set xi = 0,∀i = 1, ..., n and xi ≥ 0,∀i = n+ 1, ..., n+m,then the partition is (B = {n+ 1, ..., n+m},N = {1, ..., n}).

The above conditions are only necessary for a basic solution. To find the sufficient conditionfor a basic solution, we rewrite (72)-(74) using the definition of N and B:

max ζ = c>BxB + c>NxN (75)s. t. ABxB +ANxN = b (76)

xB ≥ 0, xN ≥ 0, (77)

where AB is the collection of columns in A whose indices are in the set B, and xN and cN are thecollections of elements in x and c, respectively, whose indices are in the set N . Here, the value for

23

xN = 0 is uniquely determined. To make sure that the value for xB is also uniquely determined,we should guarantee that the equation ABxB + ANxN = ABxB = b has a unique solution, whichrequires that the matrix AB ∈ Rm×m be invertible. This can be achieved by choosing m linearlyindependent columns in A as the set of B.

Now we have the necessary and sufficient conditions for a basic partition:

The m elements in the set B should be chosen such that AB is invertible, and (78)The n elements in the set N are then determined by N = {1, ..., n+m}\B. (79)

This partition uniquely determines a basic solution (xB = A−1B b, xN = 0).

We define: a partition (B,N ) that satisfies (78) and (79) as a basic partition; B and N in abasic partition are called basis and non-basis, respectively; and the variables xB and xN are calledbasic variable and non-basic variable, respectively.

The relationship between a basic partition and a basic solution is that, a basic partition (B,N )uniquely determines a basic solution; for any basic solution x, there exists (uniquely or not) abasic partition that determines this basic solution x. One example of different basic partitions alluniquely determining a same basic solution is the following. Suppose

A =[

1 2 1 03 6 0 1

]and b =

[13

],

then both (B = {1, 3},N = {2, 4}) and (B = {1, 4},N = {2, 3}) uniquely determine the same basicsolution x =

[1 0 0 0

]>.

Since a basic partition (B,N ) uniquely determines one basic solution, the number of basic so-lutions is bounded by the number of ways (B,N ) can be selected, which is no more than (n+m)!

n!m! .

The set of partitions of {1, 2, ..., n+m} can be divided into the following regions.

A+B+C+D: All possible partitionsB+C+D: Basic partitions, which satisfy (78) and (79)

C+D: Feasible basic partitions with xB = A−1B b ≥ 0

D: Optimal basic partitions

It is relatively easy to enter region B. The following points will discuss how to enter region Cand then D.

24

6.4 How to find an initial feasible basic solution to start from? What if the LPis infeasible?

The way simplex algorithm proceeds is to start from a feasible basic solution, move from one fea-sible basic solution to another better feasible basic solution, and finally stop at a feasible basicsolution which is optimal to the LP. (Let’s use ‘fbs’ to abbreviate ‘feasible basic solution’). We arenow ready to discuss how to find an initial fbs to start from.

It is not hard to observe that (B = {n + 1, ..., n + m},N = {1, ..., n}) is a basic partition to(71), which corresponds to

B = {n+ 1, ..., n+m}, N = {1, ..., n},cB = 0m×1, cN = c,

AB = Im×m, AN = A,xB = b, xN = 0n×1.

The partition (B,N ) is indeed a basic partition because AB = Im×m is invertible. If b ≥ 0, then(x = 0, w = b) is also a feasible basic solution.

However, if bi < 0 for some i, then x ≥ 0 is violated, and the basic solution is not feasible.To obtain an fbs in such a case, we need to use a little trick called the big-M method. DefineK = {i : bi < 0,∀i = 1, ...,m}, and let k = |K|. Now we consider a new problem

max

{ζ = c>x−M

k∑i=1

ti : Ax+ w +Ht = b;x,w, t ≥ 0

}. (80)

Here M is an extremely large finite constant, vector t ∈ Rk×1 is a new variable called artificialvariable, and matrix H ∈ Rm×k is defined as

Hi,j ={−1, if i = K(j);0, otherwise.

For example, consider the following LP as an instance of (71):

max ζ = 5x1 + 4x2 + 3x3

s. t. 2x1 + 3x2 + x3 + w4 = 54x1 + x2 + 2x3 + w5 = −113x1 + 4x2 + 2x3 + w6 = −8x1, x2, x3, w4, w5, w6 ≥ 0.

Then (80) corresponds to the following formulation:

max ζ = 5x1 + 4x2 + 3x3 − 1000t7 − 1000t8s. t. 2x1 + 3x2 + x3 + w4 = 5

4x1 + x2 + 2x3 + w5 − t7 = −113x1 + 4x2 + 2x3 + w6 − t8 = −8x1, x2, x3, w4, w5, w6, t7, t8 ≥ 0.

25

In LP (80), since there are k artificial variables but no additional constraints, the dimensionsof its partition should be |N | = n + k and |B| = m. The way we initialize this partition is to addall the artificial variables to B and then move those indices that correspond to bi < 0 from B to N .It is not hard to see that the basic partition (N = {1, ..., n} ∪ {n+K},B = {1, ..., n+m+ k}\N )uniquely determines an fbs (xN = 0, xB = A

−1B b = |b|). We can use this fbs as a starting point to

solve (80) by following the rest of the simplex steps. In the format of (72)-(74), (80) has

c =

c0m×1

−M × 1k×1

, A =[A Im×m Hm×k

], and x =

xwt

∈ Rn+m+k.

The relation between optimal solutions to (71) and (80) is given in the following propositions.

Proposition 1. Solution (x∗, w∗) is optimal to (71) if and only if there exists a finitely large Msuch that (x∗, w∗, t∗ = 0) is optimal to (80).

Proof. (⇒): Prove by construction. Let y∗ be the dual optimal solution to (71), which is min{ζ =b>y : A>y ≥ c, y ≥ 0}. Set M = maxi=1,...,m y∗i , and we know that y∗ is also feasible to the dual of(80), which is min

{ζ = b>y : A>y ≥ c,H>y ≥ −M1k×1; y ≥ 0

}. Since (x∗, w∗, t∗ = 0) and y∗ are

respectively feasible to (80) and its dual with the same objective value, they are also respectivelyoptimal.

(⇐): Prove by contradiction. Suppose (x∗, w∗, t∗ = 0) is optimal to (80) but (x∗, w∗) is notoptimal to (71), then there must exist a feasible solution (x′, w′) to (71) with c>x′ > c>x∗. However,this implies that (x′, w′, t′ = 0) is a better solution than (x∗, w∗, t∗ = 0) to (80), because (x′, w′, t′ =0) is feasible to (80) and c>x′ > c>x∗. This contradicts the assumption that (x∗, w∗, t∗ = 0) is anoptimal solution to (80).

In the simplex algorithm, there is a way to make sure that M is sufficiently large. So if we solve(80) with the simplex algorithm and get an optimal solution (x′, w′, t′) with t′i > 0 for some i, then(71) is infeasible.

Proposition 2. If for a sufficiently large M , (80) possesses an optimal solution (x′, w′, t′) witht′i > 0 for some i, then (71) is infeasible.

Proof. By Proposition 1, (71) does not have an optimal solution. Therefore, it suffices to show that(71) is not unbounded. Let y′ be the optimal dual solution to (80), then it is feasible to the dualof (71), which means that (71) is not unbounded.

If (80) is unbounded, then (71) could be either infeasible or unbounded, and we need to solvethe following LP to verify:

min

{ζ =

k∑i=1

ti : Ax+ w +Ht = b;x,w, t ≥ 0

}. (81)

If t∗ = 0 is an optimal solution to (81), then (71) is unbounded; otherwise (71) is infeasible.

The possibilities of (80) and their implications of (71) are summarized as the following:

26

(x∗, w∗, t∗) optimal to (80) (80) unbounded t′ optimal to (81)t∗ = 0 t∗ 6= 0 t′ = 0 t′ 6= 0

(x∗, w∗) optimal to (71)√

× × ×(71) infeasible ×

√×

√

(71) unbounded × ×√

×

� (80) is optimal with (x∗, w∗, t∗ = 0) ⇒ (71) is optimal with (x∗, w∗).

� (80) is optimal with (x∗, w∗, t∗) and t∗i > 0 for some i ⇒ (71) is infeasible.

� (80) is unbounded and t∗ = 0 is optimal to (81) ⇒ (71) is unbounded.

� (80) is unbounded and t∗ = 0 is not optimal to (81) ⇒ (71) is infeasible.

� LP (80) cannot be infeasible, because (xN = 0, xB = |b|) is a feasible solution to (80).

The following diagram provides an overview of the Simplex algorithm. Here we refer to thethree LPs (71), (80), and (81) as LP0, LP1, and LP2, respectively.

LP0

?HHHH

HHHH

��

��

b ≥ 0?N

Y?

?HHHH

HHHH

��

��

optimal?N

Y?

LP0optimal

-

HHHHHH

HH

��

��

unbounded?

N

Y?

LP0unbounded

6�� improve

- LP1

?

?HHHH

HHHH

��

��

optimal?N

Yt∗ = 0 t∗ 6= 0��

LP0optimal

@@R

LP0infeasible

-

HHHHHH

HH

��

��

unbounded?

N

Y

6�� improve

- LP2

?

?HHHH

HHHH

��

��

optimal?N

Yt′ = 0 t′ 6= 0

-

6�� improve

��

LP0unbounded

@@R

LP0infeasible

LP0 may be optimal, unbounded, or infeasibleLP1 may be optimal or unbounded, never infeasibleLP2 must be optimal, never unbounded or infeasible

6.5 How to tell if the current fbs is optimal or not?

One way to check the optimality of a solution is to check the optimality condition (62). In thesimplex algorithm, there is a more convenient way to check the optimality by reformulating (75)-

27

(77) as:

max ζ = c>BA−1B b+ (c>N − c>BA

−1B AN )xN (82)

s. t. xB = A−1B b−A−1

B ANxN (83)xB ≥ 0, xN ≥ 0. (84)

Equations (82) and (83) are called a dictionary :

ζ = c>BA−1B b+ (c>N − c>BA

−1B AN )xN

xB = A−1B b−A−1

B ANxN .

The term (c>N − c>BA−1B AN ) is called the reduced cost.

Proposition 3. In (82), for a given feasible basic partition (B,N ), if we have

(c>N − c>BA−1B AN ) ≤ 0,

then the fbs (xB = A−1B b, xN = 0) is optimal to (82)-(84).

Proof. Prove by contradiction. Suppose (xB = A−1B b, xN = 0) is not optimal to (82)-(84), and there

exists another fbs x′ with ζ(x′) > ζ(x). This implies that

ζ(x′)− ζ(x)

= (c>N − c>BA−1B AN )x′N − (c>N − c>BA

−1B AN )xN

= (c>N − c>BA−1B AN )x′N

> 0,

which is impossible since (c>N − c>BA−1B AN ) ≤ 0 and x′N ≥ 0.

Notice that reduced cost being non-positive is a sufficient not necessary condition for the opti-mality of an fbs. For example, consider the following LP:

max ζ = x1 + 2x2 + 3x3

s. t. x1 − 3x3 + w4 = 07x1 + 2x2 + 5x3 + w5 = 1x1, x2, x3, w4, w5 ≥ 0.

The feasible basic partition (B1 = {1, 2},N 1 = {3, 4, 5}) determines the following dictionary:

ζ = 1− 20x3 + 6w4 − w5

x1 = 3x3 − w4

x2 = 0.5− 13x3 + 3.5w4 − 0.5w5,

which gives the fbs (x1 = 0, x2 = 0.5, x3 = 0). Although the reduced cost does contain a positiveterm, this fbs is actually optimal. To see this, consider another feasible basic partition (B2 ={2, 4},N 2 = {1, 3, 5}), which determines the following dictionary:

ζ = 1− 6x1 − 2x3 − w5

x2 = 0.5− 3.5x1 − 2.5x3 − 0.5w5

w4 = −x1 + 3x3.

It gives the same fbs (x1 = 0, x2 = 0.5, x3 = 0). Since it has a non-positive reduced cost, we knowthis fbs is optimal.

28

6.6 How to find a better fbs if the current one is not optimal?

Because of the close relation between an fbs and a feasible basic partition (B,N ), the search fora better fbs (and eventually the optimal one) is equivalent to the search for a better feasiblebasic partition. The way simplex updates a feasible basic partition is by switching one pair ofnumbers between the current basis B and non-basis N at a time. If the current basic partitionis (B0,N 0), then we select an i∗ from B0 and a j∗ from N 0, and update the basic partition as(B1 = B0\{i∗} ∪ {j∗},N 1 = N 0\{j∗} ∪ {i∗}). The variable xj∗ is called the entering variable,because it will enter the basis. Similarly, xi∗ is called the leaving variable. Geometrically, such anupdate means a move from an fbs to an adjacent one. Of course, we need to make sure that thenew fbs is no worse than the current one in terms of the objective value.

Now let’s assume that we have obtained a feasible basic partition (B,N ) which is not optimal,then we will have to update the basic partition by selecting an entering variable and a leavingvariable. The rule for selecting the entering and leaving variables is called a pivoting rule. Thereare various pivoting rules, one of which called Bland’s rule is introduced below:

Entering variable xj∗ : Choose j∗ = min{j ∈ N : (c>N − c>BA−1B AN )j > 0}. (85)

Leaving variable xi∗ : Choose i∗ = min

{i′ ∈ argmaxi∈B

(A−1B AN )i,j∗

(A−1B b)i

}. (86)

After the entering and leaving variables are chosen, we get an updated partition. Bland’s ruleensures that the new partition is a better feasible basic partition. The calculation for determin-ing the leaving variable is called the ratio test, because we are trying to find the largest ratio of(A−1B AN )i,j∗

(A−1B b)i

. The i∗ that achieves the largest ratio is said to be the winner of the ratio test, and xi∗

will be the leaving variable. If there is a tie in the ratio test, the smallest winner i∗ will be selected.

The rationale for Bland’s rule is explained in the following example:

max 5x1 + 4x2 + 3x3 (87)s. t. 2x1 + 3x2 + x3 ≤ 5 (88)

4x1 + x2 + 2x3 ≤ 11 (89)3x1 + 4x2 + 2x3 ≤ 8 (90)

x1, x2, x3 ≥ 0. (91)

We start by introducing new variables w4, w5, w6, to reformulate (88)-(90) into equality constraints:

max 5x1 + 4x2 + 3x3 (92)s. t. 2x1 + 3x2 + x3 + w4 = 5 (93)

4x1 + x2 + 2x3 + w5 = 11 (94)3x1 + 4x2 + 2x3 + w6 = 8 (95)x1, x2, x3, w4, w5, w6 ≥ 0. (96)

Since the right-hand-side values are all positive, the first fbs is easy to find: x1 = x2 = x3 = 0, w4 =

29

5, w5 = 11, w6 = 8. In the context of (82)-(84),

B = {4, 5, 6} N = {1, 2, 3}

cB =

000

cN =

543

AB =

1 0 00 1 00 0 1

AN =

2 3 14 1 23 4 2

xB =

5118

xN =

000

.

The dictionary isζ = 5x1 + 4x2 + 3x3

w4 = 5− 2x1 − 3x2 − x3

w5 = 11− 4x1 − x2 − 2x3

w6 = 8− 3x1 − 4x2 − 2x3

,

and in matrix form, it is

ζ = 0 +[

5 4 3] [

x1 x2 x3

]> w4

w5

w6

=

5118

− 2 3 1

4 1 23 4 2

x1

x2

x3

.Since (c>N − c>BA

−1B AN ) =

[5 4 3

], the current fbs is not optimal, because we can improve the

objective value by increasing the values of x1, x2, x3 from zero to positive. According to Bland’srule (85), we choose j∗ = 1, thus x1 will enter the basis. But by how much could x1 increase fromzero? This is limited by the equation w4

w5

w6

=

5118

− 2 3 1

4 1 23 4 2

x1

x2

x3

in the dictionary, because changing x1 will affect the column

243

and thus the values of w4, w5, w6,

which should be non-negative. Therefore, the new value of x1 will be set to the largest numbersuch that the constraints w4, w5, w6 ≥ 0 are still satisfied. This is the reason for Bland’s rule (86).We choose the smallest i such that

i ∈ argmaxi∈B(A−1B AN )i,j∗

(A−1B b)i

= argmax{

25,

411,38

},

so i∗ = 4. The basic partition is updated as B = {1, 5, 6},N = {4, 2, 3}. If we repeat this procedure,

30

which is called an iteration in the simplex algorithm, we get the following dictionaries.

Second iteration: ζ = 12.5 +[−2.5 −3.5 0.5

] [w4 x2 x3

]> x1

w5

w6

=

2.51

0.5

− 0.5 1.5 0.5−2 −5 0−1.5 −0.5 0.5

w4

x2

x3

,Third iteration: ζ = 13 +

[−1 −3 −1

] [w4 x2 w6

]> x1

w5

x3

=

211

− 2 2 −1−2 −5 0−3 −1 2

w4

x2

w6

.After the third iteration, (c>N − c>BA

−1B AN ) =

[−1 −3 −1

]< 0, thus we know that the current

basic partition B′ = {1, 5, 3},N ′ = {4, 2, 6} is optimal, so is the current fbs x∗1 = 2, x∗2 = 0, x∗3 =1, w∗4 = 0, w∗5 = 1, w∗6 = 0.

6.7 How to identify an unbounded LP?

We can identify an unbounded LP when we use Bland’s rule (86) to determine the leaving variable

and the maximum ratio argmaxi∈B(A−1B AN )i,j∗

(A−1B b)i

is not positive.

The possibilities of the ratio (A−1B AN )i,j∗

(A−1B b)i

are summarized in the following table, where 1 and −1

represent finite positive and negative numbers, respectively.

case (A−1B b)i (A−1

B AN )i,j∗(A−1B AN )i,j∗

(A−1B b)i

updated xj∗ note

1 0 0 undefined +∞ unbounded2 0 1 +∞ 0 degenerate3 0 −1 −∞ +∞ unbounded4 1 0 0 +∞ unbounded5 1 1 1 1 new basic partition6 1 −1 −1 +∞ unbounded

There are six cases in this table. The second and third columns come from the dictionary


−1B AN )xN


B ANxN ;

the fourth column is the ratio test result in Bland’s rule (86); the fifth column calculates the largestvalue of xj∗ such that constraint xB ≥ 0 still holds; and the sixth column explains the implicationif the corresponding xj becomes the entering variable. If case 1, 3, 4, or 6 is the winner of the ratiotest, then we claim the LP is unbounded.

6.8 Will the algorithm ever terminate?

If appropriate pivoting rules are used, the algorithm will terminate finitely. First of all, as long asthe numbers of variables and constraints are finite, there is only a finite number of basic solutions,so the algorithm does not need to visit infinitely many fbs’s to reach the optimal one.

31

Proposition 4. If the dimensions of matrix A in (71) are finite, then the number of basic solutionsto (71) is finite.

Proof. Since any basic solution can be determined by a basic partition (B,N ), the number of basicsolutions is bounded by the number of ways (B,N ) can be selected, which is no more than (n+m)!

n!m! .Therefore, the number of basic solutions is finite.

Secondly, it can be proved (refer to the text for the proof) that if Bland’s rule is used, thealgorithm will terminate in finitely many iterations. Under some other pivoting rules (for exam-ple the largest coefficient rule: choose j∗ = min

{j ∈ argmaxj∈N (c>N − c>BA

−1B A)j > 0

}for the

entering variable, and choose i∗ = min{i′ ∈ argmaxi∈B

(A−1B AN )i,j∗

(A−1B b)i

}for the leaving variable),

however, the algorithm could get stuck in a so called “cycle” and never terminate. A cycle oc-curs when the algorithm moves to a basic partition which has been visited before. For example,(B0,N 0) → (B1,N 1) → (B2,N 2) → ... → (Bk,N k) → (B1,N 1) → (B2,N 2).... A cycling exampleusing the largest coefficient rule can be found in the text on page 31.

The reason a cycle occurs is because of degeneracy, which occurs when the updated basicpartition determines a same fbs as the previous basic partition does. For example, in this problem

max 10x1 − 57x2 − 9x3 − 24x4

s. t. 0.5x1 − 5.5x2 − 2.5x3 + 9x4 + w5 = 00.5x1 − 1.5x2 − 0.5x3 + x4 + w6 = 0

x1 + w7 = 1x1, x2, x3, x4, w5, w6, w7 ≥ 0,

A =

0.5 −5.5 −2.5 9 1 0 00.5 −1.5 −0.5 1 0 1 01 0 0 0 0 0 1

,the following basic partitions all determine a same fbs x1 = 0, x2 = 0, x3 = 0, x4 = 0, w5 = 0, w6 =0, w7 = 1:

(B1 = {5, 6, 7},N 1 = {1, 2, 3, 4})(B2 = {1, 6, 7},N 2 = {5, 2, 3, 4})(B3 = {1, 2, 7},N 3 = {5, 6, 3, 4})(B4 = {3, 2, 7},N 4 = {5, 6, 1, 4})(B5 = {3, 4, 7},N 5 = {5, 6, 1, 2})(B6 = {5, 4, 7},N 6 = {3, 6, 1, 2}).

From another perspective, a solution x to (72)-(74) is degenerate if there are more than n+mactive constraints at x. This implies that not only xN = 0, but also there exists some i ∈ B suchthat xi = 0. In the dictionary, we can tell that degeneracy occurs if (A−1

B b)i = 0 for some i ∈ B. Inthe above example, the dictionary is

ζ = 0 +[

10 −57 −9 −24] [

x1 x2 x3 x4

]> w5

w6

w7

=

001

− 0.5 −5.5 −2.5 9

0.5 −1.5 −0.5 11 0 0 0

x1

x2

x3

x4

.32

The entering variable is clearly x1, which should move from non-basis to basis in the updated basicpartition. However, there are zeros in the constant vector on the right hand side of the secondequation in the dictionary, which indicates that the current solution x1 = 0, x2 = 0, x3 = 0, x4 =0, w5 = 0, w6 = 0, w7 = 1 is a degenerate one. As a result, x1 has to stay zero in the basis, otherwisew5 will become negative due to the constraint w5 = 0− 0.5x1.

Using Bland’s rule, we may also move from one basic partition to another that determines asame fbs, but we will not come back to the same basic partition that has been visited before. Sincethere is only a finite number of basic partitions, we will eventually move out of the cycle towardsthe optimal basic partition.

6.9 How efficient is the simplex algorithm?

Simplex updates the fbs iteratively to find the optimal solution, thus its efficiency depends on

how many iterations are needed. The total number of basic solutions is bounded by(n+mm

)=

(n+m)!n!m! . If m = n, then this number becomes

(2nn

). It can be proved that 1

2n22n ≤(

2nn

)≤ 22n,

which means that the number of basic solutions increases exponentially with the “size” of the prob-lem (number of variables and constraints). Some pivoting rules could cause cycling and neverterminate; even for existing non-cycling rules, people have found instances (refer to Section 4.4 ofVanderbei for an example) where the simplex algorithm will have to visit every single basic solutionbefore it finds the optimal one. However, it is an open question whether there exists a pivoting rulethat will never require an exponential number of iterations.

There does exist an algorithm called the ellipsoid method that can solve linear programs poly-nomially (never requires exponentially many iterations). Theoretically, the ellipsoid method has abetter worst-case efficiency than the simplex algorithm, however, the practical performance of thesimplex algorithm is much better than that of the ellipsoid method, and that is one of the reasonsthe simplex algorithm is more widely used.

6.10 Summary

The simplex algorithm is summarized with the following steps, as illustrated in Figure 5.

Step 1 Reformulation: Write the LP in standard form max{ζ = c>x : Ax ≤ b, x ≥ 0}, reformu-late it as max{ζ = c>x : Ax+ w = b, x ≥ 0, w ≥ 0}, and then go to Step 2.

Step 2 Initialization:

(2a) If b ≥ 0, then reformulate the problem as

max{ζ = c>x : Ax = b, x ≥ 0},

where

c =[

c0m×1

], A =

[A Im×m

], and x =

[xw

]∈ Rn+m.

The initial basic partition is (B = {n + 1, ..., n + m},N = {1, ..., n}), and the initial fbs is(xB = A

−1B b = b, xN = 0). Go to Step 3.

33

(2b) If bi < 0 for some i, then use the big-M method to find the initial basic partition and theinitial fbs. Define K = {i : bi < 0,∀i = 1, ...,m}, and let k = |K|. Reformulate the problem as

max{ζ = c>x : Ax = b, x ≥ 0},

where

c =

c0m×1

−M × 1k×1

, A =[A Im×m H

], and x =

xwt

∈ R(n+m+k)×1.

When compared with other finite constants finitely many times, M is always assumed to bebigger. Matrix H ∈ Rm×k is defined as

Hi,j ={−1, if i = K(j);0, otherwise.

The initial basic partition is (N = {1, ..., n}∪{n+K(1), ..., n+K(k)},B = {1, ..., n+m+k}\N ),and the initial fbs is (xB = A

−1B b = |b|, xN = 0). Go to Step 3.

Step 3 Optimality check:

(3a) If (c>N − c>BA−1B AN )i > 0 for some i ∈ N , then go to Step 4.

(3b) If (c>N − c>BA−1B AN ) ≤ 0 and big-M method was not used in Step 2, then stop, and the

optimal solution is (x∗B = A−1B b, x∗N = 0), ζ∗ = c>BA

−1B b.

(3c) If (c>N − c>BA−1B AN ) ≤ 0, big-M method was used in Step 2, and xi = 0 for all i = n+m+

1, ..., n+m+ k, then stop, and the optimal solution is (x∗B = A−1B b, x∗N = 0), ζ∗ = c>BA

−1B b.

(3d) If (c>N − c>BA−1B AN ) ≤ 0, big-M method was used in Step 2, and xi > 0 for some i ∈

{n+m+ 1, ..., n+m+ k}, then stop, and the LP is infeasible.

Step 4 Improvement: Choose

j∗ = min{j ∈ N : (c>N − c>BA−1B AN )j > 0},

and

i∗ = min

{i ∈ argmaxi∈B

(A−1B AN )i,j∗

(A−1B b)i

}.

(4a) If maxi∈B(A−1B AN )i,j∗

(A−1B b)i

= +∞ or finite positive, then update the basic partition and the fbs

(B = B\{i∗} ∪ {j∗},N = N\{j∗} ∪ {i∗}),

(xB = A−1B b, xN = 0),

and go back to Step 3.

(4b) If maxi∈B(A−1B AN )i,j∗

(A−1B b)i

= −∞, 0, finite negative, or undefined(

00

), and big-M method was not

used in Step 2, then stop, and the LP is unbounded.

34

(4c) If maxi∈B(A−1B AN )i,j∗

(A−1B b)i


00

), big-M method was used in

Step 2, then solve the feasibility problem (81). If t = 0 is an optimal solution to (81) thenthe original LP is unbounded.

(4d) If maxi∈B(A−1B AN )i,j∗

(A−1B b)i


00

), big-M method was used in

Step 2, then solve the feasibility problem (81). If t = 0 is not an optimal solution to (81)then the original LP is infeasible.

Figure 5: The simplex algorithm diagram

Let’s look at an example of using the simplex algorithm to solve an LP.

max 10x1 + 12x2 + 12x3 (97)s. t. x1 + x2 + x3 ≥ 1

x1 + 2x2 + 2x3 ≤ 202x1 + x2 + 2x3 ≤ 202x1 + 2x2 + x3 ≤ 20x1, x2, x3 ≥ 0.

As is illustrated in Figure 6, the feasible region is within the corner points {A=(1,0,0), B=(0,1,0),E=(0,0,1), F=(10,0,0), C=(0,10,0), G=(0,0,10), D=(4,4,4)}.

Step 1

max ζ =[

10 12 12 0 0 0 0] [

x1 x2 x3 w4 w5 w6 w7

]>

s. t.

−1 −1 −1 1 0 0 01 2 2 0 1 0 02 1 2 0 0 1 02 2 1 0 0 0 1

x1

x2

x3

w4

w5

w6

w7

=

−1202020

[x1 x2 x3 w4 w5 w6 w7

]> ≥ 0.

35

Figure 6: Graphic representation of LP (84)

Step 2(b)

max{ζ = c>x : Ax = b, x ≥ 0},

where c =[

10 12 12 0 0 0 0 −M]>, A =

−1 −1 −1 1 0 0 0 −11 2 2 0 1 0 0 02 1 2 0 0 1 0 02 2 1 0 0 0 1 0

, x =

[x1 x2 x3 w4 w5 w6 w7 t8

]>, and b =

−1202020

. Initialize (B = {5, 6, 7, 8},N = {1, 2, 3, 4}).

The current basic solution (x∗5,6,7,8 = A−1B b =

[20 20 20 1

]>, x∗1,2,3,4 = 0) corresponds to

the yellow corner point O in Figure 6. Since big-M is being used and t8 = x8 = 1 > 0, this basicsolution is not feasible. But the big-M method will lead to a feasible basic solution.

Step 3(a)

c>N − c>BA−1B AN =

[10 +M 12 +M 12 +M −M

]. Go to Step 4.

Step 4(a)

36

The dictionary is

ζ = −M +[

10 +M 12 +M 12 +M −M]

x1

x2

x3

x4

x5

x6

x7

x8

=

2020201

−

1 2 2 02 1 2 02 2 1 01 1 1 −1

x1

x2

x3

x4

.The entering variable is x1, and the leaving variable is x8. Set (B = {5, 6, 7, 1},N = {8, 2, 3, 4}).

The current fbs (x∗5,6,7,1 = A−1B b =

[19 18 18 1

]>, x∗8,2,3,4 = 0) corresponds to the blue

corner point A in Figure 6. The big-M has found an fbs solution.

Step 3(a)


[−M − 10 2 2 10

]. Go to Step 4.

Step 4(a)

The dictionary is

ζ = 10 +[−M − 10 2 2 10

] x8

x2

x3

x4

x5

x6

x7

x1

=

1918181

−−1 1 1 1−2 −1 0 2−2 0 −1 21 1 1 −1

x8

x2

x3

x4



[18 19 18 1


corner point B in Figure 6.

Step 3(a)


[−M − 12, −2, 0, 12

]. Go to Step 4.

Step 4(a)

37

The dictionary is

ζ = 12 +[−M − 12, −2, 0, 12

] x8

x1

x3

x4

x5

x6

x7

x2

=

1819181

−−2 −1 0 2−1 1 1 1−2 0 −1 21 1 1 −1

x8

x1

x3

x4



[9 10 0 10


corner point C in Figure 6.

Step 3(a)


[−M 4 0 −6

]. Go to Step 4.

Step 4(a)

The dictionary is

ζ = 120 +[−M 4 0 −6

] x8

x1

x3

x5

x4

x6

x7

x2

=

910010

−−1 −0.5 0 0.50 1.5 1 −0.50 1 −1 −10 0.5 1 0.5

x8

x1

x3

x5



[9 10 0 10


corner point C in Figure 6. Because of degeneracy, the simplex algorithm has updated the basicpartition but it corresponds to the same fbs.

Step 3(a)


[−M −4 4 −2

]. Go to Step 4.

Step 4(a)

38

The dictionary is

ζ = 120 +[−M −4 4 −2

] x8

x7

x3

x5

x4

x6

x1

x2

=

910010

−−1 0.5 −0.5 00 −1.5 2.5 10 1 −1 −10 −0.5 1.5 1

x8

x7

x3

x5



[11 4 4 4

]>, x∗8,7,6,5 = 0) corresponds to the red corner

point D in Figure 6. It will be verified in the next step that this fbs is optimal.

Step 3(c)


[−M −1.6 −1.6 −3.6

]≤ 0, big-M method was used in Step 2, and

x8 = 0, so stop, and the optimal solution is (x∗1,2,3,4 = A−1B b =

[4 4 4 11

]>,

x∗5,6,7,8 = 0), ζ∗ = c>BA−1B b = 136.

7 Dual Simplex

Let us look at the simplex algorithm from the duality perspective. First, we transform the primalmax{c>x : Ax ≤ b;x ≥ 0} and dual min{b>y : A>y ≥ c; y ≥ 0} into max{c>x : Ax+ w = b;x,w ≥0} and min{b>y : A>y − z = c; y, z ≥ 0}, respectively. Then, the optimality condition becomes

0 ≤[xw

]⊥[zy

]≥ 0.

We can further transform these problems into max{ζ = c>x : Ax = b, x ≥ 0} and max{−ζ = b>z :Az = c, z ≥ 0}, where

c =[

c0m×1

], A = [A, Im×m], x =

[xw

], b =

[0n×1

−b

], A = [−In×n, A

>], z =[zy

].

For an optimal basic partition (N ,B), the optimality condition becomes[xN = 0xB ≥ 0

]⊥[zN ≥ 0zB = 0

].

Notice that non-basic dual variable zN ≥ 0 and basic dual variable zB = 0 may look counter-intuitive, but that is because the non-basis N and basis B are originally defined for the primalproblem, thus we need to pay special attention when use the same basic partition on the dual.

The dictionary can also be defined for the dual problem using a basic partition (N ,B):

− ζ = b>N A−1N c+ (b>B − b>N A−1

N AB)zB (98)

zN = A−1N c− A−1

N ABzB. (99)

39

Sometimes it is more convenient to apply the simplex algorithm to the dual LP instead of theprimal. For example, suppose we have a primal LP as follows

max ζ =[−1 −1

] [x1 x2

]>s. t.

−2 −1−2 4−1 3

[ x1

x2

]≤

4−8−7

[x1 x2

]> ≥ 0.

When we apply the simplex algorithm, we need to use the big-M method, because b =[

4 −8 −7]>

has negative elements. However, if we look at the dual

min ζ =[

4 −8 −7] [

y3 y4 y5

]>s. t.

[−2 −2 −1−1 4 3

] y3

y4

y5

≥ [ −1−1

][y3 y4 y5

]> ≥ 0,

whose standard form LP is

max −ζ =[−4 8 7

] [y3 y4 y5

]>s. t.

[2 2 11 −4 −3

] y3

y4

y5

≤ [ 11

][y3 y4 y5

]> ≥ 0,

we can easily find an initial fbs to start the simplex algorithm. Now let us apply the simplexalgorithm to the dual and at the same time keep track of what happens to the primal.

40

Apply simplex to the dual What happens to the primal

max −ζ =[

0 0 −4 8 7]z1z2y3

y4

y5

s. t.[−1 0 −2 −2 −10 −1 −1 4 3

]z1z2y3

y4

y5

=[−1−1

][z1 z2 y3 y4 y5

]> ≥ 0.

max ζ =[−1 −1 0 0 0

]x1

x2

w3

w4

w5

s. t.

−2 −1 1 0 0−2 4 0 1 0−1 3 0 0 1

x1

x2

w3

w4

w5

=

4−8−7

[x1 x2 w3 w4 w5

]> ≥ 0.max{−ζ = b>z : Az = c, z ≥ 0},where max{ζ = c>x : Ax = b, x ≥ 0},where

b =[

0 0 −4 8 7]>

A =[−1 0 −2 −2 −10 −1 −1 4 3

]z =

[z1 z2 y3 y4 y5

]>c =

[−1 −1 0 0 0

]>A =

−2 −1 1 0 0−2 4 0 1 0−1 3 0 0 1

x =

[x1 x2 w3 w4 w5

]>Dual dictionary:−ζ = b>N A

−1N c+ (b>B − b>N A

−1N AB)zB

zN = A−1N c− A−1

N ABzB

Primal dictionary:ζ = c>BA

−1B b+ (c>N − c>BA

−1B AN )xN


B ANxN

iteration 1: B = {3, 4, 5},N = {1, 2}

−ζ = 0 +[−4 8 7

] z3z4z5

[z1z2

]=[

11

]−[

2 2 11 −4 −3

] z3z4z5

iteration 1: B = {3, 4, 5},N = {1, 2}

ζ = 0 +[−1 −1

] [ x1

x2

] x3

x4

x5

=

4−8−7

− −2 −1−2 4−1 3

[ x1

x2

]iteration 2: B = {3, 1, 5},N = {4, 2}

−ζ = 4 +[−12 −4 3

] z3z1z5

[z4z2

]=[

0.53

]−[

1 0.5 0.55 2 −1

] z3z1z5

iteration 2: B = {3, 1, 5},N = {4, 2}

ζ = −4 +[−0.5 −3

] [ x4

x2

] x3

x1

x5

=

124−3

− −1 −5−0.5 −2−0.5 1

[ x4

x2

]iteration 3: B = {3, 1, 4},N = {5, 2}

−ζ = 7 +[−18 −7 −6

] z3z1z4

[z5z2

]=[

14

]−[

2 1 27 3 2

] z3z1z4

iteration 3: B = {3, 1, 4},N = {5, 2}

ζ = −7 +[−1 −4

] [ x5

x2

] x3

x1

x4

=

1876

− −2 −7−1 −3−2 −2

[ x5

x2

]

From the above table, we have some observations: (1) if b has a negative element but c ≤ 0, wecan avoid the big-M method by applying the simplex algorithm to the dual. (2) When we apply

41

the simplex algorithm to the dual, the reduced costs (c>N − c>BA−1B AN ) in the corresponding primal

dictionaries are always non-positive but the basic variables xB = A−1B b are always infeasible until

the last iteration. (3) The primal and dual dictionaries have a negative transpose relationship:

[b>N A

−1N c (b>B − b>N A

−1N AB)

A−1N c −A−1

N AB

]= −

[c>BA

−1B b (c>N − c>BA

−1B AN )

A−1B b −A−1

B AN

]>.

Based on the above observations, we could design a special rule of selecting the entering andleaving variables so that we could go through the iterations on the primal side (right-hand-sidecolumn of the above table) without writing down the dual iterations. An algorithm that appliesthis idea is called the dual simplex algorithm.

Although point (3) has been observed throughout all iterations of the example, its theoreticalcorrectness is not obvious. The following table defines the problem and then the observation isgiven as a proposition.

Primal:max ζ = c>xs. t. Ax ≤ bx ≥ 0

Dual:min ζ = b>ys. t. A>y ≥ c

y ≥ 0

max ζ = c>xs. t. Ax+ w = b

x, w ≥ 0

max−ζ = −b>ys. t. A>y − z = c

y, z ≥ 0

Define:

c =[

c0m×1

]:=[cNcB

]A = [A, Im×m] := [AN , AB]

x =[xw

]:=[xNxB

]Define:

b =[

0n×1

−b

]:=[bNbB

]A = [−In×n, A

>] := [AN , AB]

z =[zy

]:=[zNzB

]

max{ζ = c>x : Ax = b, x ≥ 0} max{−ζ = b>z : Az = c, z ≥ 0}

max ζ = c>NxN + c>BxBs. t. ANxN +ABxB = b

x ≥ 0

max −ζ = b>N zN + b>B zBs. t. AN zN + ABzB = c

z ≥ 0

Primal dictionaryζ = c>BA

−1B b+ (c>N − c>BA

−1B AN )xN


B ANxN

Dual dictionary−ζ = b>N A

−1N c+ (b>B − b>N A

−1N AB)zB

zN = A−1N c− A−1

N ABzB

42

Proposition 5. The primal and dual dictionaries derived above have a negative-transpose relation-ship: [

c>BA−1B b (c>N − c>BA

−1B AN )

A−1B b −A−1

B AN

]= −

[b>N A

−1N c (b>B − b>N A

−1N AB)

A−1N c −A−1

N AB

]>.

Proof. Bottom right: −A−1B AN = (A−1

N AB)>.First, we have

AA> = [A, I][−I, A>]> = 0 and AA> = [AN , AB][AN , AB]> = AN A>N +ABA

>B = 0.

Multiplying both sides of the last equation by A−1B and (A>N )−1, we have

A−1B [AN A>N +ABA

>B ](A>N )−1 = A

−1B AN + A>B (A>N )−1 = 0.

Bottom left: A−1B b = −(b>B − b>N A

−1N AB)>.

Top right: (c>N − c>BA−1B AN ) = −(A−1

N c)>.Top left: c>BA

−1B b = −(b>N A

−1N c)>.

From the above point, we can see that in the primal dictionary


−1B AN )xN


B ANxN ,

the optimality condition in the simplex algorithm (c>N − c>BA−1B AN ) ≤ 0 is actually the dual fea-

sibility zN = −A−1N c ≥ 0. When the optimal dictionary is obtained, we get not only the pri-

mal optimal solution (xB = A−1B b, xN = 0) but also the dual optimal solution (zN = A−1

N c =−(c>N − c>BA

−1B AN ), zB = 0). Complementary slackness is satisfied since[

xB ≥ 0xN = 0

]⊥[zB = 0zN ≥ 0

].

An alternative to the big-M method is the following. Consider

(P0): max{c>x : Ax ≤ b, x ≥ 0}

and its dual(D0): min{b>y : A>y ≥ c, y ≥ 0}.

Step 1 If b ≥ 0, then apply the simplex algorithm on (P0) with the initial feasible basic solutionx = 0. Otherwise go to Step 2.

Step 2 If c ≤ 0, then apply the simplex algorithm on (D0) with the initial feasible basic solutiony = 0. Otherwise go to Step 3.

Step 3 Construct the following LP:

(P1): max{0 : Ax ≤ b, x ≥ 0}

and its dual(D1): min{b>y : A>y ≥ 0, y ≥ 0}.

Apply the simplex algorithm on (D1) with the initial feasible basic solution y = 0. If (D1) isoptimal, go to Step 4; if (D1) is unbounded, go to Step 5.

43

Step 4 Let y1 and x1 be optimal solutions to (D1) and (P1), respectively. Apply the simplexalgorithm on (P0) with x1 as the initial feasible basic solution.

Step 5 If (D1) is unbounded, then (P1) is infeasible, thus (P0) is also infeasible.

For example, we have the following (P0) with b1 < 0 and c2 > 0.

max ζ =[−14 6

] [x1 x2

]>s. t.

7 16−7 913 −16

[ x1

x2

]≥

−478

[x1 x2

]> ≥ 0.

We construct (D1) as

min ζ =[−4 7 8

] [y3 y4 y5

]>s. t.

[7 −7 1316 9 −16

] y3

y4

y5

≤ [ 00

][y3 y4 y5

]> ≥ 0.

We solve (D1), and it turns out to be unbounded, which means that (P0) is infeasible.

8 Sensitivity Analysis

Suppose we have found an optimal solution x∗ to an LP max{c>x : Ax ≤ b, x ≥ 0} using thesimplex algorithm, and then we change some of the parameters (A, b, c) and have a new LP. Wehave the following questions: Is the optimal basic partition for the original LP still optimal to thenew LP? If the answer is yes, then the optimal solution for the new LP can be easily obtained. Ifthe answer is no, do we need to solve the new LP from the very beginning, or can we use the originalLP’s optimal solution to find the new LP’s optimal solution faster? This type of post-optimalityanalysis is called sensitivity analysis. We focus our discussion on the following four cases.

8.1 Changing the objective coefficients: c→ c+ λ∆c

Changing the objective coefficients will not affect the primal feasibility, but could possibly affectthe dual feasibility. The new dual dictionary is

−ζ = b>N A−1N (c+ λ∆c) + (b>B − b>N A

−1N AB)zB

zN = A−1N (c+ λ∆c)− A−1

N ABzB.

� The range of λ for the original LP’s optimal basic partition to remain optimal to the new LPcan be obtained by solving the following inequality:

A−1N (c+ λ∆c) ≥ 0.

� If A−1N (c + λ∆c) ≥ 0 holds, then the original LP’s optimal basic partition is still optimal to

the new LP.

44

� Otherwise, the new dictionary doesn’t have dual feasibility, but primal feasibility remains.We can find the optimal solution for the new LP by simply applying the simplex iterationsto the new primal dictionary. Since we don’t need to start over from step 1 of the simplexalgorithm, we could find the optimal solution to the new LP faster.

� For example, consider the following LP:

max ζ =[−1 −1

] [x1 x2

]>s. t.

−2 −1−2 4−1 3

[ x1

x2

]≤

4−8−7

[x1 x2

]> ≥ 0.

The optimal basic partition is (B = {3, 1, 4},N = {5, 2}). Suppose we change c =[−1 −1

]>to c+ λ∆c with ∆c =

[1 0

]>, and we want to find the range of λ that keeps the partitionoptimal. The dual dictionary is:

−ζ = 7− 7λ+[−18 −7 −6

] z3z1z4

[z5z2

]=

[1− λ4− 3λ

]−[

2 1 27 3 2

] z3z1z4

.For the partition to remain optimal, we need 1− λ ≥ 0 and 4− 3λ ≥ 0, thus λ ≤ 1.

� Suppose λ = 1.1, then the primal dictionary becomes:

ζ = 0.7 +[

0.1 −0.7] [ x5

x2

] x3

x1

x4

=

1876

− −2 −7−1 −3−2 −2

[ x5

x2

].

We can then conclude that changing c from[−1 −1

]to[

0.1 −1]

would make the LPunbounded.

8.2 Changing the right-hand-side: b→ b+ λ∆b

Changing the right-hand-side will not affect the dual feasibility, but could possibly affect the primalfeasibility. The new primal dictionary is

ζ = c>BA−1B (b+ λ∆b) + (c>N − c>BA

−1B AN )xN

xB = A−1B (b+ λ∆b)−A−1

B ANxN .

� The range of λ for the original LP’s optimal basic partition to remain optimal to the new LPcan be obtained by solving the following inequality:

A−1B (b+ λ∆b) ≥ 0.

45

� If A−1B (b + λ∆b) ≥ 0 holds, then the original LP’s optimal basic partition is still optimal to

the new LP. The new optimal primal solution becomes (xB = A−1B (b+λ∆b), xN = 0), but the

optimal dual solution stays the same. From strong duality, the new optimal objective valueζ ′ = (b+ λ∆b)>y∗, where y∗ is the optimal dual solution to the original problem. As long asthe basic partition stays optimal, y∗ gives the rate at which λ∆b changes the objective value.Therefore, y∗ is also known as the shadow price.

� Otherwise if A−1B (b+λ∆b) ≥ 0 doesn’t hold, the new dictionary doesn’t have primal feasibility,

but dual feasibility remains. We can find the optimal solution for the new LP by applyingthe simplex iterations to the dual dictionary. Since we don’t need to start over from step 1of the simplex algorithm, we could find the optimal solution to the new LP faster.

� Suppose we change b =[

4 −8 −7]> to b + λ∆b with ∆b =

[−6 −5 −2

]>, and wewant to find the range of λ that keeps the partition optimal. The primal dictionary is:

ζ = −7− 2λ+[−1 −4

] [ x5

x2

] x3

x1

x4

=

18− 2λ7 + 2λ6− λ

− −2 −7−1 −3−2 −2

[ x5

x2

].

For the partition to remain optimal, we need 18 − 2λ ≥ 0, 7 + 2λ ≥ 0, and 6 − λ ≥ 0, thus−3.5 ≤ λ ≤ 6.

8.3 Adding k new constraints:

A→ A′ =

A

a>m+1...

a>m+k

and b→ b′ =

b

bm+1...

bm+k

Corresponding to the k new constraints, there should be k new slack variables in the primal

problem and k new dual variables in the dual. If we add all the k new variables into the basisB′ = B ∪ {n+m+ 1, ..., n+m+ k} and look at the new dual dictionary

−ζ = b>N A−1N c+ (b′

>B′ − b>N A

−1N A′B′)z′B′

zN = A−1N c− A−1

N A′B′ z′B′ ,

the dual feasibility will not be affected, but primal feasibility could be affected.

� If the original optimal solution x∗ satisfies the new constraints:

a>m+1...

a>m+k

x∗ ≤ bm+1

...bm+k

,

then x∗ is still optimal.

� Otherwise, we can find the optimal solution for the new LP by applying the simplex iterationsto the new dual dictionary. Since we don’t need to start over from step 1 of the simplexalgorithm, we could find the optimal solution to the new LP faster.

46

� Suppose we add a new constraint x1 + x2 ≤ 6. The optimal solution to the original LPis x∗1 = 7, x∗2 = 0, which violates the new constraint. Therefore, the original optimal basicpartition is no longer optimal. We update the dual dictionary as follows:

−ζ = 7 +[−18 −7 −6 1

] z3z1z4z6

[z5z2

]=[

14

]−[

2 1 2 −17 3 2 −4

]z3z1z4z6

.Obviously, the dual LP is unbounded, thus the primal is infeasible with the new constraint.

8.4 Adding k new variables:

x→ x′ =

x

xn+1...

xn+k

, A→ A′ =[A an+1 · · · an+k

], and c→ c′ =

c

cn+1...

cn+k

If we add all the k new variables into the non-basis N ′ = N ∪ {n + m + 1, ..., n + m + k},

the primal feasibility will not be affected, but dual feasibility could be affected. The new primaldictionary is

ζ = c>BA−1B b+ (c′>N ′ − c>BA

−1B A′N ′)x′N ′


B A′N ′x′N ′ .

� If the original dual optimal solution y∗ satisfies the new constraints:

a>n+1...

a>n+k

y∗ ≥ cn+1

...cn+k

,

then (x∗, y∗) is still optimal.

� Otherwise, we can find the optimal solution for the new LP by simply applying the simplexiterations to the new primal dictionary. Since we don’t need to start over from step 1 of thesimplex algorithm, we could find the optimal solution to the new LP faster.

� Suppose we add a new variable x3, expand A to A′ =

−2 −1 3−2 4 −2−1 3 1

, and change c

to c′ =[−1 −1 1

]>. Since the dual optimal solution y∗ =[

0 0 1]> satisfies the

constraint[

3 −2 1]y∗ ≥ 1, we conclude that the optimal solution to the original LP

remains optimal with x∗3 = 0.

9 Decomposition

Special techniques can be applied to solve some large-scale LP problems with special structures,which would be much slower or even impossible to solve otherwise. Three examples of thesetechniques, namely Benders decomposition, Dantzig-Wolfe decomposition, and column generationare briefly introduced here.

47

9.1 Benders decomposition

Consider the following LP problem:

max ζ =[c>0 c>1 c>2 · · · c>k

]x0

x1

x2...xk

(100)

s. t.

A0 0 0 · · · 0

A1 D1 0. . . 0

A2 0 D2. . . 0

......

. . . . . ....

Ak 0 · · · 0 Dk

x0

x1

x2...xk

≤b0b1b2...bk

[x0 x1 x2 · · · xk

]> ≥ 0,

where ci ∈ Rni×1, Ai ∈ Rmi×ni , bi ∈ Rmi×1, xi ∈ Rni×1 for i = 1, 2, ..., k. If k, mi’s and ni’s arelarge, this problem could be extremely hard to solve due to its dimension. However, we can takeadvantage of its special structure and solve this large problem by solving much smaller problemsiteratively as follows.

Step 1 Solve the following master problem:

max ζ = c>0 x0 + z1 + z2 + · · ·+ zk (101)s. t. A0x0 ≤ b0

Constraints (106) and (108) from Step 2.x0 ≥ 0; z1, z2, ..., zk free.

If the master problem is infeasible, then we stop and conclude that the original LP (100)is infeasible. Otherwise, let (x∗0, z

∗1 , z∗2 , ..., z

∗k) be the optimal solution to (101), which could

possibly include +∞ values. For example, in the first few steps, there may be no constraintsto keep some zi’s from approaching +∞. In that case, we simply let those zi’s be +∞ andonly maximize over the constrained variables. Here z∗i represents an optimistic estimate ofc>i x

∗i in (100).

Step 2 For i = 1, 2, ..., k, solve the following subproblems:

max ζi = c>i xi (102)s. t. Dixi ≤ bi −Aix

∗0

xi ≥ 0.

These subproblems can be solved in parallel to save computation time. The results of thesesubproblems fall into one of the following three cases.

(Case I:) If (102) is infeasible, then there must exist an extreme ray for the dual to (102)

min ζi = (bi −Aix∗0)>yi (103)

s. t. D>i yi ≥ ciyi ≥ 0.

48

We can obtain an extreme ray ∆y∗i of (103) by solving the following problem:

min (bi −Aix∗0)>∆yi (104)

s. t. D>i ∆yi ≥ 00mi×1 ≤ ∆yi ≤ 1mi×1.

We should send a message back to the master problem that x0 should not make the subprob-lems infeasible. The ideal message would be

(bi −Aix0)>∆yi(x0) ≥ 0, (105)

where ∆yi(x0) is the optimal solution to (104), and it is a function of x0. Constraint (105) saysthat for any x0, the associated ∆yi(x0) should not allow the extreme ray to exist. However,Constraint (105) is a nonlinear constraint. Instead, we add the following linearized constraint

(bi −Aix0)>∆y∗i ≥ 0 (106)

to (101). For any x0, since ∆y∗i is always feasible to (104), we have (bi − Aix0)>∆y∗i ≥(bi − Aix0)>∆yi(x0) ≥ 0. Therefore, Constraint (106) is valid and it is called the feasibilitycut.

(Case II:) If (102) and (103) have an optimal solution (x∗i , y∗i ) but ζ∗i < z∗i , then z∗i is an

unrealistic estimate. We should send a message back to the master problem that zi shouldbe more realistic. The ideal message would be

(bi −Aix0)>yi(x0) ≥ zi, (107)

where yi(x0) is the optimal solution to (103), and it is a function of x0. Constraint (107)says that for any x0, the estimate zi should be no larger than the associated objective valueζi = (bi−Aix0)>yi(x0). However, Constraint (107) is a nonlinear constraint. Instead, we addthe following linearized constraint

(bi −Aix0)>y∗i ≥ zi (108)

to (101). For any x0, since y∗i is always feasible to (103), we have (bi − Aix0)>y∗i ≥(bi − Aix0)>yi ≥ zi. Therefore, Constraint (108) is valid, and it is called the optimalitycut.

(Case III:) If (102) is unbounded, or (102) is optimal and ζ∗i ≥ z∗i , then z∗i is realistic.

If z∗i is realistic for all i = 1, ..., k, then stop and the current solution (x∗0, x∗1, ..., x

∗k) is optimal.

Otherwise go back to Step 1 and solve (101) again with added constraints.

An example of using Benders decomposition to solve a simple LP:

max x0 + 2x1 + 3x2

s. t.

1 0 01 1 01 0 2

x0

x1

x2

≤ 3

24

x0, x1, x2 ≥ 0.

49

Step 1 Solve the following master problem:

max ζ = x0 + z1 + z2

s. t. x0 ≤ 3x0 ≥ 0; z1, z2 free.

This LP is unbounded, and the “optimal” solution is x∗0 = 3, z∗1 = +∞, z∗2 = +∞.

Step 2 Solve the following two subproblems for x∗0 = 3:

(LP1) max ζ1 = 2x1

s. t. x1 ≤ 2− x∗0x1 ≥ 0,

and

(LP2) max ζ2 = 3x2

s. t. 2x2 ≤ 4− x∗0x2 ≥ 0.

(LP1) is infeasible, so we need to add a feasibility cut to the master problem. The dual to(LP1) is

(D1) min ζ1 = (2− x∗0)y1

s. t. y1 ≥ 2y1 ≥ 0.

It is easy to see that ∆y∗1 = 1 is an extreme ray of (D1), so the feasibility cut is (2−x0)×1 ≥ 0,or equivalently, x0 ≤ 2.

The optimal solution to (LP2) is x∗2 = 0.5, ζ∗2 = 1.5. Since z∗2 = +∞ is an unrealistic estimateof ζ∗2 , we need to add an optimality cut to the master problem. The dual to (LP2) is

(D2) min ζ2 = (4− x∗0)y2

s. t. 2y2 ≥ 3y2 ≥ 0.

It is easy to see that y∗2 = 1.5 is an optimal solution to (D2), so the optimality cut is(4− x0)× 1.5 ≥ z2, or equivalently, 1.5x0 + z2 ≤ 6.

Step 1 Solve the following master problem with all feasibility and optimality cuts:

max ζ = x0 + z1 + z2

s. t. x0 ≤ 3x0 ≤ 2

1.5x0 + z2 ≤ 6x0 ≥ 0; z1, z2 free.

This LP is unbounded, and the “optimal” solution is x∗0 = 0, z∗1 = +∞, z∗2 = 6.

50

Step 2 Solve (LP1) and (LP2) for x∗0 = 0.

The optimal solution to (LP1) is x∗1 = 2, ζ∗1 = 4. Since z∗1 = +∞ is an unrealistic estimate ofζ∗1 , we need to add an optimality cut to the master problem. It is easy to see that y∗1 = 2 is anoptimal solution to (D1), so the optimality cut is (2−x0)×2 ≥ z1, or equivalently, 2x0+z1 ≤ 4.

The optimal solution to (LP2) is x∗2 = 2, ζ∗2 = 6, therefore, z∗2 = 6 is a realistic estimate of ζ∗2 .

Step 1 Solve the following master problem with all feasibility and optimality cuts:

max ζ = x0 + z1 + z2

s. t. x0 ≤ 3x0 ≤ 2

1.5x0 + z2 ≤ 62x0 + z1 ≤ 4

x0 ≥ 0; z1, z2 free.

The optimal solution is x∗0 = 0, z∗1 = 4, z∗2 = 6, ζ∗ = 10.

Step 2 Solve (LP1) and (LP2) for x∗0 = 0. The optimal solutions are x∗1 = 2, ζ∗1 = 4 and x∗2 =2, ζ∗2 = 6. z∗1 and z∗2 are realistic estimates of ζ∗1 and ζ∗2 , so the optimal solution to the originalproblem is x∗0 = 0, x∗1 = 2, x∗2 = 2, ζ∗ = 10.

9.2 Dantzig-Wolfe decomposition

Dantzig-Wolfe decomposition applies to the dual of (100):

min ζ =[b>0 b>1 b>2 · · · b>k

]y0

y1

y2...yk

(109)

s. t.

A>0 A>1 A>2 · · · A>k0 D>1 0 · · · 0

0 0 D>2. . . 0

... 0 0 . . . 00 0 · · · 0 D>k

y0

y1

y2...yk

≤c0c1c2...ck

[y0 y1 y2 · · · yk

]> ≥ 0,

and the algorithm steps are also dual to those of Bender’s decomposition.

9.3 Column generation

Suppose a linear program has an extremely large number of variables but relatively few constraints.In the following dictionary,


−1B AN )xN


B ANxN ,

51

n is so large that the matrix AN ∈ Rm×n is even hard to be entirely stored in the computer. For-tunately, in a basic solution, only m basic variables could be non-zeros, and the non-basic variablesare all zeros. As long as we find the entering variable xj∗ , we only need A−1

B b and the j∗th columnof the matrix A

−1B A to find the leaving variable. The only difficulty here is to find the entering

variable, which is to find a negative reduced cost (c>N − c>BA−1B AN ). This could be accomplished

by solving the problem maxj∈N (c> − c>BA−1B A)j . If this largest reduced cost is zero or negative, it

means that the current basic solution is optimal; otherwise we can find a negative reduced cost, andits corresponding j∗ will be entering the basis. The column generation technique can be applied ifthis problem can be solved efficiently without having to compute every single reduced cost value.

A classical example of using column generation to solve the cutting stock problem can be foundon page 234 in the reference text Dimitris Bertsimas and John N. Tsitsiklis, Introduction to LinearOptimization, 1997.

10 Interior Point Algorithm

The interior point algorithm is another algorithm for solving linear programming and other math-ematical programming problems. Instead of moving along the corner points, the interior pointalgorithm moves inside the feasible region towards the optimal solution. The basic idea of the cen-tral path method, one of various variations of the interior point algorithm, is briefly introduced here.

We slightly abuse some mathematical operations to simplify the notations. For vectors x, y ∈Rn×1,

� we use log x to denote the vector[

log x1 log x2 · · · log xn

]>;

� we use x−1 to denote the vector[

1x1

1x2· · · 1

xn

]>;

� we use x× y to denote the vector[x1y1 x2y2 · · · xnyn

]>.

Consider the standard form linear program

max{c>x : Ax ≤ b, x ≥ 0} (110)

and its equivalent formulation max{c>x : Ax + w = b;x,w ≥ 0}. We define a new problem,called the barrier problem, in which we relax the non-negativity constraints x,w ≥ 0 but give someincentive in the objective function for x and w to be positive:

max c>x+ µ∑

i

log xi + µ∑

i

logwi (111)

s. t. Ax+ w = b,

where µ > 0 is a parameter. Problem (111) is not only nonlinear but also nonequivalent to (110).When µ is larger, (111) will emphasize more on maximizing µ

∑i log xi + µ

∑i logwi and less on

the true objective c>x; when µ gets smaller, c>x will be emphasized more in (111); when µ → 0,(111) will provide an optimal solution to (110). The idea of the central path method is to solve asequence of (111) with µ approaching zero to achieve the optimal solution to (110).

52

Although (111) and (110) look very different, their KKT conditions have a close relationship,which will be derived below. The KKT conditions (49)-(52) are for problem (48) with inequalityconstraints. For nonlinear programs with equality constraints max{f(x) : G(x) = 0m×1}, the KKTconditions are: there exists a vector β that satisfies

∇>f(x) +m∑

i=1

βi∇>Gi(x) = 0n×1 (112)

G(x) = 0. (113)

Here β is used as the dual variables to G(x) = 0 so that the symbol y can be reserved for later usein (117).

Applying KKT conditions (112) and (113) to (111), we have

c+ µx−1 +A>β = 0 (114)µw−1 + β = 0 (115)Ax+ w = b. (116)

We could also do the same thing for the dual of (110):

min{b>y : A>y ≥ c, y ≥ 0}, (117)

whose equivalent formulation is min{b>y : A>y − z = c; y, z ≥ 0} and whose barrier problem is

max −b>y + µ∑

j

log yj + µ∑

j

log zj (118)

s. t. A>y − z = c.

The KKT conditions of the dual barrier problem are: there exists a vector α that satisfies

−b+ µy−1 +Aα = 0 (119)µz−1 − α = 0 (120)A>y − z = c. (121)

Comparing primal KKT conditions (114)-(116) and dual KKT conditions (119)-(121), we find thatthey are equivalent if we let β = −y and α = x. Then the combined KKT conditions become:

Ax+ w = b (122)A>y − z = c (123)x× z = µn×1 (124)w × y = µm×1. (125)

It can be proved that the KKT conditions are necessary and sufficient for the optimality of (111).Therefore, the solution {x(µ), w(µ), y(µ), z(µ)} from (122)-(125) will all be positive. These KKTconditions approach the optimality conditions for (110) and (117) when µ approaches 0. In theinterior point algorithm, we try to find solutions {x(µ), w(µ), y(µ), z(µ)} that satisfy (122)-(125).As we make the value of µ > 0 smaller and smaller iteratively, {x(µ), w(µ), y(µ), z(µ)} will becomecloser and closer to the optimal solutions to (110) and (117).

Exercise problem 17.2 of the text gives a simple example where the closed-form solution for thecentral path can be computed. However, in general, obtaining exact solutions to (122)-(125) couldbe hard, instead solutions that approximately satisfy these conditions are used in real implemen-tation.

53

11 Multi-objective Optimization

In real-world applications, we sometimes have more than one objective to optimize. For example,maximize profit and minimize environmental footprint, minimize time and maximize quality, mini-mize cost and maximize reliability, etc. These problems are called multi-objective or multi-criteriaoptimization problems. Here we briefly introduce an approach to solving multi-objective linearprogramming problems.

In a single objective LP, a feasible solution is optimal if it is not dominated by any other feasiblesolution. A feasible solution x1 is dominated by another feasible solution x2 if ζ(x1) < ζ(x2). Theremay exist multiple optimal solutions, but the optimal objective value ζ∗ is unique.

In a multi-objective LPmax{ζi = c>i x,∀i : Ax ≤ b, x ≥ 0}, (126)

we can define an optimal solution in a similar manner: a feasible solution is optimal if it is notdominated by any other feasible solution. When we compare two feasible solutions, we need toconsider all objectives. We say a feasible solution x1 is dominated by another feasible solution x2

if ζi(x1) ≤ ζi(x2), ∀i and ζi(x1) < ζi(x2),∃i. As a result of this definition, not only there mayexist multiple optimal solutions, but there may also exist very different combinations of optimalobjective values. These optimal solutions provide various trade-offs among different objectives andit is up to specific decision-makers to decide which tradeoff is best under specific circumstances.These optimal solutions to a multi-objective LP are called Pareto optimal solutions.

To understand the definition of Pareto optimal solutions, consider the following example, inwhich we have two objectives ζc and ζd and six candidate solutions x1, x2, ..., x6. By definition, x1,x2, x4 and x5 are Pareto optimal solutions, and x3 and x6 are not.

Figure 7: An example of multi-objective optimization

x ζc(x) ζd(x)x1 1 7x2 4 5x3 6 3x4 8 4x5 9 2x6 9 1

54

Given a solution x0, the following LP can be used to check its Pareto optimality:

max∑

i c>i x (127)

s. t. Ax ≤ b (128)c>i x ≥ c>i x0, ∀i (129)

x ≥ 0 (130)

Solution x0 is a Pareto optimal solution to the multi-objective LP (126) if and only if x0 is anoptimal solution to the single-objective LP (127)-(130). Although LP (127)-(130) can be used tocheck the Pareto optimality of a given solution, it does not tell us how to find a Pareto optimalsolution. For a multi-objective LP, we can use the following proposition to find a Pareto optimalsolution.

Proposition 6. Solution x0 is a Pareto optimal solution to the multi-objective LP (126) if and onlyif there exist αi > 0,∀i such that x0 is an optimal solution to the single-objective LP max{

∑(αici)>x :

Ax ≤ b, x ≥ 0}.

It is worth noting that this proposition will not hold in general for multi-objective nonlinear orinteger programming problems.

For a multi-objective LP with two objectives

max{ζc = c>x and ζd = d>x : Ax ≤ b, x ≥ 0}, (131)

we can use the following algorithm to obtain Pareto optimal solutions with all combinations ofobjectives ζc and ζd.

� Let xc and xd be optimal solutions to max{c>x : Ax ≤ b, x ≥ 0} and max{d>x : Ax ≤ b, x ≥0}, respectively.

� Let xc and xd be optimal solutions to max{d>x : Ax ≤ b, c>x ≥ c>xc, x ≥ 0} and max{c>x :Ax ≤ b, d>x ≥ d>xd, x ≥ 0}, respectively. Solutions xc and xd are Pareto optimal.

� Use the following recursive procedure to obtain other Pareto optimal solutions between xc

and xd. For a given pair of Pareto optimal solution xc and xd with ζc(xc) > ζc(xd) andζd(xd) > ζd(xc), this recursive procedure finds all other Pareto optimal solutions betweenthem.

1. First let xm be an optimal solution to

max{ζm = [d>(xd − xc)c+ c>(xc − xd)d]>x : Ax ≤ b, x ≥ 0}.

2. If ζm(xm) = ζm(xc) = ζm(xd), then stop and all solutions on the line segment betweenxc and xd are Pareto optimal. Otherwise xm is another Pareto optimal solution. Wealso need to find Pareto optimal solutions between xc and xm, and then between xm andxd using this recursive procedure.

55

For example, let’s find Pareto optimal solutions for the following multi-objective LP:

max ζc = 5x1 + 4x2 + 3x3

and ζd = −7x1 + 3x2

s. t. 2x1 + 3x2 + x3 ≤ 54x1 + x2 + 2x3 ≤ 113x1 + 4x2 + 2x3 ≤ 8

x1, x2, x3 ≥ 0.

Step 1: Maximize c>x = 5x1 + 4x2 + 3x3. The optimal solution is xC =[

2 0 1]> with

ζc(xC) = 13, ζd(xC) = −14. Since xC is optimal to

max −7x1 + 3x2

s. t. 2x1 + 3x2 + x3 ≤ 54x1 + x2 + 2x3 ≤ 113x1 + 4x2 + 2x3 ≤ 85x1 + 4x2 + 3x3 ≥ 13

x1, x2, x3 ≥ 0,

it is a Pareto optimal solution.

Step 2: Maximize d>x = −7x1 + 3x2. The optimal solution is xD =[

0 1.67 0]> with

ζc(xD) = 6.67, ζd(xD) = 5. Since xD is optimal to

max 5x1 + 4x2 + 3x3

s. t. 2x1 + 3x2 + x3 ≤ 54x1 + x2 + 2x3 ≤ 113x1 + 4x2 + 2x3 ≤ 8−7x1 + 3x2 ≥ 5x1, x2, x3 ≥ 0,

it is a Pareto optimal solution.

56

Step 3: Maximize ζ1(x) = [d>(xD−xC)c>+c>(xC−xD)d>]x = 50.67x1+95x2+57x3. The op-timal solution is x1 =

[0 0 4

]> with ζc(x1) = 12, ζd(x1) = 0. Since ζ1(x1) > ζ1(xC) = ζ1(xD),we need to find Pareto optimal solutions between xC and x1, and then between x1 and xD.

Step 4: Maximize [d>(x1 − xC)c> + c>(xC − x1)d>]x = 63x1 + 59x2 + 42x3. x1 and xC areoptimal solutions. Therefore, all solutions on the line segment between xC and x1 are Pareto opti-mal.

Step 5: Maximize ζ2(x) = [d>(xD−x1)c>+c>(x1−xD)d>]x = −12.33x1+36x2+15x3. The op-timal solution is x2 =

[0 1 2

]> with ζc(x2) = 10, ζd(x2) = 3. Since ζ2(x2) > ζd(x1) = ζ2(xD),we need to find Pareto optimal solutions between xD and x2, and then between x2 and x1.

Step 6: Maximize [d>(xD − x2)c> + c>(x2 − xD)d>]x = −13.33x1 + 18x2 + 6x3. xD and x2

are optimal solutions. Therefore, all solutions on the line segment between xD and x2 are Paretooptimal.

Step 7: Maximize [d>(x2− x1)c>+ c>(x1− x2)d>]x = x1 + 18x2 + 9x3. x2 and x1 are optimalsolutions. Therefore, all solutions on the line segment between x2 and x1 are Pareto optimal.

After Step 7, we have found all Pareto optimal solutions, which are on the line segments betweenxD, x2, x1, and xC .

12 Integer Programming

12.1 Gomory Cut

We briefly introduce a cutting plane method for integer linear program problems. Some materialsare from the reference text Bertsimas and Tsitsiklis (1997). Let IP(A, b, c) denote the IP:

maxx{c>x : Ax ≤ b, x ≥ 0, x ∈ Zn}. (132)

Define the LP relaxation of (132) as LP(A, b, c): maxx{c>x : Ax ≤ b, x ≥ 0}. The basic idea ofcutting plane algorithms is: solve the LP relaxation LP(A, b, c) and see if the optimal solution,x∗, satisfies the integrality constraints. If not, add a linear constraint (cut) and start over. Theadded constraint should eliminate x∗ from the feasible region but it must not eliminate any feasibleinteger solution.

57

Gomory cutting plane algorithm was the first finitely terminating algorithm for integer program-ming. Suppose an LP relaxation is solved, the optimal solution x∗ does not satisfy the integralityconstraint, and the optimal simplex dictionary is the following:


−1B AN )xN


B ANxN .

Let x∗i be a variable that violates the integrality constraint, and let the following equation denotethe corresponding row in the dictionary:

xi = ai,0 −∑j∈N

ai,jxj .

Since x ≥ 0, we havexi +

∑j∈Nbai,jcxj ≤ xi +

∑j∈N

ai,jxj = ai,0.

Now consider the following constraint:

xi +∑j∈Nbai,jcxj ≤ bai,0c. (133)

Any integer feasible solution to the IP(A, b, c) satisfies this constraint, but x∗ does not. Therefore,(133) is a valid cut, and it is called a Gomory cut. The following example of using Gomory cuts tosolve an integer program can be found on page 482 of Bertsimas and Tsitsiklis (1997).

max −x1 + 2x2 (134)s. t. −4x1 + 6x2 ≤ 9 (135)

x1 + x2 ≤ 4 (136)x1, x2 ≥ 0 (137)

x1, x2 integer. (138)

The optimal solution to the LP relaxation of (134)-(138) is x∗1 = 1.5, x∗2 = 2.5. The dictionaryis

ζ = 3.5− 0.2w4 − 0.3w3 (139)x2 = 2.5− 0.4w4 − 0.1w3 (140)x1 = 1.5− 0.6w4 + 0.1w3 (141)

Equation (140) yields the following cut

x2 ≤ 2,


x1 − w3 ≤ 1⇒ x1 − (9 + 4x1 − 6x2) ≤ 1⇒ −3x1 + 6x2 ≤ 10.

58

Add the cuts to the LP relaxation, solve the augmented LP, and the optimal solution is x∗1 =0.75, x∗2 = 2. The dictionary is

ζ = 3.25− 0.5w5 − 0.25w3 (142)x2 = 2− w5 (143)w4 = 1.25 + 2.5w5 − 0.25w3 (144)x1 = 0.75− 1.5w5 + 0.25w3 (145)w6 = 0.25 + 1.5w5 + 0.75w3 (146)


x1 + w5 − w3 ≤ 0⇒ x1 + (2− x2)− (9 + 4x1 − 6x2) ≤ 0⇒ −3x1 + 5x2 ≤ 7.

Add the cut to the LP relaxation, solve the augmented LP, and the optimal solution is x∗1 = 1, x∗2 =2. This is an integer solution, thus optimal to the original integer program.

12.2 Branch and Bound

Branch-and-bound is another widely used approach to solving integer programming problems. Thebasic idea is briefly introduced here.

� If LP(A, b, c) is infeasible, then IP(A, b, c) is infeasible.

� If LP(A, b, c) is unbounded, then IP(A, b, c) is either unbounded or infeasible. We can try toobtain a feasible solution by solving IP(A, b, 0). If IP(A, b, 0) has a feasible solution, thenIP(A, b, c) is unbounded, otherwise it is infeasible.

� Create a tree with a single node that contains the LP(A, b, c) and z =∞. Define x = ∅ andζ = −∞ as the incumbent solution and its objective value, which keeps track of the bestinteger feasible solution we have found.

� While{the tree has at least one node}Choose a node, solve the LP, let x∗ and ζ∗ be the respective optimal solution and optimalobjective value, and remove this node from the tree.

– If LP is infeasible or ζ∗ ≤ ζ, do nothing (bounding step).

– If ζ∗ > ζ and x∗ ∈ Zn, then update x = x∗ and ζ = ζ∗. Also remove all existing nodeswith z ≤ ζ (bounding step).

– If ζ∗ > ζ but x∗i /∈ Z for some i, then clone two copies of the parent node with additionalconstraints. One node with xi ≤ bx∗i c, z = ζ∗ and the other with xi ≥ dx∗i e, z = ζ∗

(branching step).

When the tree contains many nodes, different strategies could be used to choose a node to solve.Depth-first and breadth-first are the most commonly used strategies. Depth-first strategy choosesthe youngest node first, whereas the breadth-first strategy chooses the oldest one.

As an example, we solve (134)-(138) using the depth-first branch and bound algorithm.

59

Node 0LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 ≥ 0, x2 ≥ 0}.LP solution: x∗1 = 1.5, x∗2 = 2.5, ζ∗ = 3.5.Current best IP solution: ∅.Create Node 1 with x1 ≤ 1, z = 3.5 and Node 2 with x1 ≥ 2, z = 3.5.

Node 1LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, x2 ≥ 0}.LP solution: x∗1 = 1, x∗2 = 2.167, ζ∗ = 3.33.Current best IP solution: ∅.Create Node 3 with x2 ≤ 2, z = 3.33 and Node 4 with x2 ≥ 3, z = 3.33.

Node 3LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 0.75, x∗2 = 2, ζ∗ = 3.25.Current best IP solution: ∅.Create Node 5 with x1 ≤ 0, z = 3.25 and Node 6 with x1 ≥ 1, z = 3.25.

Node 5LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 0, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 0, x∗2 = 1.5, ζ∗ = 3.Current best IP solution: ∅.Create Node 7 with x2 ≤ 1, z = 3 and Node 8 with x2 ≥ 2, z = 3.

Node 7LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 0, 0 ≤ x2 ≤ 1}.LP solution: x∗1 = 0, x∗2 = 1, ζ∗ = 2.Current best IP solution: x1 = 0, x2 = 1, ζ = 2.

Node 8LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 0, x2 = 2}.

60

LP solution: infeasible.Current best IP solution: x1 = 0, x2 = 1, ζ = 2.

Node 6LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 1, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 1, x∗2 = 2, ζ∗ = 3.Current best IP solution: x1 = 1, x2 = 2, ζ = 3.

Node 4LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, x2 ≥ 3}.LP solution: infeasible.Current best IP solution: x1 = 1, x2 = 2, ζ = 3.

Node 2LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 ≥ 2, x2 ≥ 0}.LP solution: x∗1 = 2, x∗2 = 2, ζ∗ = 2.Current best IP solution: x1 = 1, x2 = 2, ζ = 3.

Next we solve the same IP using the breadth-first strategy.

Node 0LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 ≥ 0, x2 ≥ 0}.LP solution: x∗1 = 1.5, x∗2 = 2.5, ζ∗ = 3.5.Current best IP solution: ∅.Create Node 1 with x1 ≤ 1, z = 3.5 and Node 2 with x1 ≥ 2, z = 3.5.

Node 1LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, x2 ≥ 0}.LP solution: x∗1 = 1, x∗2 = 2.167, ζ∗ = 3.33.Current best IP solution: ∅.Create Node 3 with x2 ≤ 2, z = 3.33 and Node 4 with x2 ≥ 3, z = 3.33.

61

Node 2LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 ≥ 2, x2 ≥ 0}.LP solution: x∗1 = 2, x∗2 = 2, ζ∗ = 2.Current best IP solution: x1 = 2, x2 = 2, ζ = 2.

Node 3LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 0.75, x∗2 = 2, ζ∗ = 3.25.Current best IP solution: x1 = 2, x2 = 2, ζ = 2.Create Node 5 with x1 ≤ 0, z = 3.25 and Node 6 with x1 ≥ 1, z = 3.25.

Node 4LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4; 0 ≤ x1 ≤ 1, x2 ≥ 3}.LP solution: infeasible.Current best IP solution: x1 = 2, x2 = 2, ζ = 2.

Node 5LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 0, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 0, x∗2 = 1.5, ζ∗ = 3.Current best IP solution: x1 = 2, x2 = 2, ζ = 2.Create Node 7 with x2 ≤ 1, z = 3 and Node 8 with x2 ≥ 2, z = 3.

Node 6LP: max{−x1 + 2x2 : −4x1 + 6x2 ≤ 9;x1 + x2 ≤ 4;x1 = 1, 0 ≤ x2 ≤ 2}.LP solution: x∗1 = 1, x∗2 = 2, ζ∗ = 3.Current best IP solution: x1 = 1, x2 = 2, ζ = 3. Remove nodes 7 and 8.

13 Inverse Optimization

13.1 InvLP

Let LP(A, b, c) denote the linear program maxx{c>x : Ax ≤ b, x ≥ 0}. The inverse linear pro-gramming problem (InvLP) is to minimally perturb c to make a given feasible solution xd optimal.In other words, it is to find a vector d ∈ Rn such that the given feasible solution xd is optimal toLP(A, b, d) and that the difference between c and d, measured by ||c− d||, is minimized. We focusour discussion on the weighted L1 norm, so ||c−d|| becomes w>|c−d|, where w ∈ Rn

+ is a constantvector. A straightforward formulation of the InvLP under the weighted L1 norm is

mind{w>|c− d| : xd ∈ argmaxx{d>x : Ax ≤ b, x ≥ 0}}. (147)

Observe that xd ∈ argmaxx{d>x : Ax ≤ b, x ≥ 0} if and only if there exists y ∈ Rm such that(i) Axd ≤ b, xd ≥ 0 (primal feasibility), (ii) A>y ≥ d, y ≥ 0 (dual feasibility), and (iii) b>y = d>xd

(strong duality). If we define e, f ∈ Rn+ such that c − d = e − f , then (147) can be equivalently

62

reformulated as the following LP:

miny,e,f

w>e+ w>f (148)

s. t. A>y ≥ c− e+ f (149)b>y = (c− e+ f)>xd (150)

y, e, f ≥ 0. (151)

This result enables one to solve an InvLP using existing efficient LP algorithms and software. Forexample, if

A =

2 −132 11 15−22 60−2 3

, b =

01436270

, c =[

13

], xd =

[00

], w =

[11

],

then e∗ =[

3 0]>, f∗ =

[0 0

]>, so d∗ =[−2 3

]>.

We introduce a cutting plane algorithm for InvLP, which bypasses strong duality of linearprogramming. The motivation of this algorithm is the following lemma:

Lemma 1. Feasible solution xd is optimal to LP(A, b, d) if and only if (i) there exists y ≥ 0 suchthat A>y ≥ d, and (ii) d>xd ≥ d>x0 for all x0 ∈ S, where S is the set of extreme points of{Ax ≤ b, x ≥ 0}.

Proof. The ‘only if’ direction is obvious. To prove the ‘if’ direction, one only needs to notice thatcondition (i) ensures that LP(A, b, d) is bounded from above, thus there exists an optimal extremepoint solution. The optimality of xd is then ensured by condition (ii).

By Lemma 1, InvLP (147) can be equivalently solved as miny,d{w>|c − d| : A>y ≥ d, y ≥0, d>xd ≥ d>x0, ∀x0 ∈ S}. Without having S, the complete set of extreme points of {Ax ≤ b, x ≥0}, in hand, we can start with the set of known extreme points, S0 = ∅, and dynamically updateit. At each iteration, we solve miny,d{w>|c− d| : A>y ≥ d, y ≥ 0, d>xd ≥ d>x0, ∀x0 ∈ S0} and let(y∗, d∗) denote the optimal solution. If xd is optimal to LP(A, b, d∗), then stop; otherwise we add

63

the optimal extreme point solution into S0 and go through another iteration. This idea leads tothe following cutting plane algorithm for InvLP (147) under the weighted L1 norm, which we referto as (AlgInvLP

L1).

Algorithm (AlgInvLPL1

) for InvLP (147)

Step 0 Initiate S0 = ∅.

Step 1 Solve the following LP and let (y∗, e∗, f∗) be an optimal solution:

miny,e,f

w>e+ w>f (152)

s. t. A>y ≥ c− e+ f (153)(c− e+ f)>xd ≥ (c− e+ f)>x0, ∀x0 ∈ S0 (154)

y, e, f ≥ 0. (155)

Step 2 Let x0 be an optimal extreme point solution to LP(A, b, (c−e∗+f∗)). If (c−e∗+f∗)>xd ≥(c−e∗+f∗)>x0, then stop, and d∗ = c−e∗+f∗ is optimal to InvLP (147). Otherwise updateS0 = S0 ∪ {x0} and go back to Step 1.

Theorem 1. Algorithm (AlgInvLPL1

) terminates finitely with an optimal solution to InvLP (147).

Proof. It is easy to show that (y = 0, e = max{c, 0}, f = max{−c, 0}) is a feasible solution to(152)-(155) and that (152)-(155) is bounded from below by 0. Since the number of extreme pointsis finite for the given polyhedron {Ax ≤ b, x ≥ 0}, (152)-(155) will be solved for finitely manytimes. Let (y∗, e∗, f∗) be an optimal solution to (152)-(155) when the algorithm terminates at Step2. Since xd is optimal to LP(A, b, (c− e∗ + f∗)), (y∗, e∗, f∗) is optimal to

miny,e,f

w>e+ w>f

s. t. A>y ≥ c− e+ f

(c− e+ f)>xd ≥ (c− e+ f)>x0, ∀x0 ∈ Sy, e, f ≥ 0.

By Lemma 1, (y∗, e∗, f∗) is also optimal to (148)-(151), which is equivalent to InvLP (147).

13.2 InvMILP

If some of the variables are required to be integers, then the linear program LP(A, b, c) becomes amixed integer linear program IP(A, b, c, I): maxx{c>x : Ax ≤ b, x ≥ 0, xI ∈ Z|I|}, and the inverseproblem becomes an inverse mixed integer linear programming problem (InvMILP):

mind{w>|c− d| : xd ∈ argmaxx{d>x : Ax ≤ b, x ≥ 0, xI ∈ Z|I|}}. (156)

Because the duality theory of integer programming is far more sophisticated than linear program-ming, the formulation (148)-(151) cannot be used to solve InvMILP. The algorithm (AlgInvLP

L1),

which bypasses the LP strong duality, can be modified to solve InvMILP.

Algorithm (AlgInvMILPL1

) for InvMILP (156)

64

Step 0 Initiate S0 = ∅.

Step 1 Solve the following LP and let (y∗, e∗, f∗) be an optimal solution:

miny,e,f

w>e+ w>f (157)

s. t. A>y ≥ c− e+ f (158)(c− e+ f)>xd ≥ (c− e+ f)>x0, ∀x0 ∈ S0 (159)

y, e, f ≥ 0. (160)

Step 2 Let x0 be an optimal extreme point solution to IP(A, b, (c−e∗+f∗), I). If (c−e∗+f∗)>xd ≥(c − e∗ + f∗)>x0, then stop, and d∗ = c − e∗ + f∗ is optimal to InvMILP (156). Otherwiseupdate S0 = S0 ∪ {x0} and go back to Step 1.

The only difference between algorithms (AlgInvLPL1

) and (AlgInvMILPL1

) is that LP(A, b, (c−e∗+f∗))in (AlgInvLP

L1) is replaced with IP(A, b, (c− e∗ + f∗), I) in (AlgInvMILP

L1).

Theorem 2. Algorithm (AlgInvMILPL1

) terminates finitely with an optimal solution to InvMILP(156).

Proof. Let polyhedron {Ax ≤ b, x ≥ 0} describe the convex hull of {Ax ≤ b, x ≥ 0, xI ∈ Z|I|},then the InvMILP (156) can be equivalently solved using (AlgInvLP

L1) with LP(A, b, (c − e∗ + f∗))

replaced with LP(A, b, (c− e∗ + f∗)) and (152)-(155) replaced with

minz,e,f

w>e+ w>f (161)

s. t. A>z ≥ c− e+ f (162)(c− e+ f)>xd ≥ (c− e+ f)>x0,∀x0 ∈ S0 (163)

z, e, f ≥ 0. (164)

It then suffices to show that (157)-(160) and (161)-(164) are equivalent. Observing that thesetwo LPs have the same objective function that only involves variables e and f , we prove theirequivalence by showing that there exists a feasible solution (y, e0, f0) to (157)-(160) if and onlyif there exists a feasible solution (z, e0, f0) to (161)-(164). Given that xd is a feasible solution toLP(A, b, c) and IP(A, b, c, I), the existence of (y, e0, f0) is a necessary and sufficient condition forLP(A, b, (c− e0 + f0)) to be bounded from above; similarly, solution (z, e0, f0) exists if and only ifLP(A, b, (c− e0 + f0)) and IP(A, b, (c− e0 + f0), I) are bounded from above. Finally, by Theorem6.3 of Nemhauser and Wolsey (1999), if LP(A, b, (c− e0 + f0)) and IP(A, b, (c− e0 + f0), I) bothpossess a feasible solution (which is xd) then they are either both bounded or both unbounded.

Applying this algorithm to the same example, we get: e∗ =[

2.5 0]>, f∗ =

[0 0

]>, so

d∗ =[−1.5 3

]>.

65

IE 534 Linear Programming Lecture Notes Fall 2011lzwang/LectureNotesOnline.pdfIE 534 Linear...

Documents

Transcript of IE 534 Linear Programming Lecture Notes Fall 2011lzwang/LectureNotesOnline.pdfIE 534 Linear...