EN625-743 HW1

EN 625.743 - Stochastic Optimization and ControlJohns Hopkins University 5 February 2015Nam Nicholas Mai Homework Assignment1

Problem 1: Exercise A.1 - First and Second-order Taylor Approximations

Solution: Following the the appendix, suppose that = [t1, t2]T and L () is given by

L () = t21 + t22 + e

2t1t2 . (1.1)

By definition, we have that g () is given by

g () =L

=

Lt1Lt2

=2t1 2e2t1t2

2t2 e2t1t2

. (1.2)Likewise, the Hessian matrix H () is given by

H () =2L

T=

2L2t1 2Lt1t22Lt2t1

2L2t2

=2 + 4e2t1t2 2e2t1t2

2e2t1t2 2 + e2t1t2

. (1.3)To verify that these expressions are correct, two test points are chosen, 1 = [0.05,0.05]T and 2 =[0.5,0.5]T . The results follow in Table 1 below. MATLAB was used to compute these values. The codeto this problem is given after the table.

= [t1, t2]T First-order

(A.1)Second-order

(A.2)L ()

[0.1, 0.1]T 0.700 0.765 0.761

[0.05,0.05]T 0.950 0.956 0.956[0.5, 0.5]T 0.500 1.125 1.107

Table 1: Comparison of first and second order Taylor approximations at [0, 0]T

MATLAB Code Problem 1

1 %% Nam Nicholas Mai2 % Johns Hopkins University - EP3 % EN 625.743 - Stochastic Optimization and Control4 % 5 February 20155 %6 % Homework 1 - Course introduction7 % Problem 1: Example A.1 - Evaluating Taylor Approximations to L(theta)8 %9 % Copyright, All Rights Reserved

10

11 clearvars;clc;12

EN 625.743 - Homework Assignment1 2

13 % Set the input variables, where T stands for the vector theta14 Tp = [0 0]'; % theta prime15 T1 = [0.05 -0.05]'; % 1st test point16 T2 = [0.5 -0.5]'; % 2nd test point17 T3 = [0.1 0.1]'; % 3rd test point, control from appendix A.118

19 % Calculate the loss function L at Tp, T1, and T220 L Tp = Tp(1)2 + Tp(2)2 + exp(-2*Tp(1) - Tp(2));21 L T1 = T1(1)2 + T1(2)2 + exp(-2*T1(1) - T1(2));22 L T2 = T2(1)2 + T2(2)2 + exp(-2*T2(1) - T2(2));23 L T3 = T3(1)2 + T3(2)2 + exp(-2*T3(1) - T3(2));24

25 % Calculate the gradient g at Tp26 grad Tp = [2*Tp(1)-2*exp(-2*Tp(1)-Tp(2)); ...27 2*Tp(2)-exp(-2*Tp(1)-Tp(2))];28

29 % Calculate the Hessian matrix H at Tp30 Hessian Tp = [2+4*exp(-2*Tp(1)-Tp(2)) 2*exp(-2*Tp(1)-Tp(2)); ...31 2*exp(-2*Tp(1)-Tp(2)) 2+exp(-2*Tp(1)-Tp(2))];32

33 % Calculate the approximation at the points T1 and T234 L T1 1st = L Tp + grad Tp'*(T1 - Tp);35 L T1 2nd = L Tp + grad Tp'*(T1 - Tp) + 0.5*(T1 - Tp)'*Hessian Tp*(T1-Tp);36

37 L T2 1st = L Tp + grad Tp'*(T2 - Tp);38 L T2 2nd = L Tp + grad Tp'*(T2 - Tp) + 0.5*(T2 - Tp)'*Hessian Tp*(T2-Tp);39

40 L T3 1st = L Tp + grad Tp'*(T3 - Tp);41 L T3 2nd = L Tp + grad Tp'*(T3 - Tp) + 0.5*(T3 - Tp)'*Hessian Tp*(T3-Tp);42

43 % Output the results as matrix44 format short;45 L approx = round([L T1 1st L T1 2nd L T1; ...46 L T2 1st L T2 2nd L T2; ...47 L T3 1st L T3 2nd L T3].*1000) ./ 1000

Problem 2: Exercise A.7 - Let t =(ATA

)1ATx, where x is a random vector with Cov [x] = 2I,

where is a scalar, and A is a non-random matrix such that the indicated inverse exists. Show thatCov [t] =

(ATA

)12.

Solution: The gist of this problem is to show the mathematical properties of the covariance matrix undermatrix multiplication. To simplify the problem, let us prove the generalized case as a lemma.

Lemma 2.1 Suppose x is a random vector and A be a non-random matrix. Then,

Cov [Ax] = ACov [x]AT (2.1)


Proof. Recall the definition of the covariance of a random vector,

Cov [x] = E[(x E [x]) (x E [x])T

]. (2.2)

Then, Equation 2.1 becomes

Cov [Ax] = E[(Ax E [Ax]) (Ax E [Ax])T

](2.3)

= E[A (x E [x]) (x E [x])T AT

](2.4)

= AE[(x E [x]) (x E [x])T

]AT (2.5)

= ACov [x]AT (2.6)

The equalities above come from expectation of a constant and properties of the transpose.

To the end of proving the problem statement, define B =(ATA

)1AT . Then, we have

Cov [t] = Cov[(ATA

)1ATx

]= Cov [Bx] = BCov [x]BT , (2.7)

where the last equality follows from the direct application of Lemma 2.1. Then, by properties of thetranspose

BT =[(ATA

)1AT]T

= A[(ATA

)1]T= A

[(ATA

)T ]1= A

(ATA

)1. (2.8)

Finally, Equation 2.7 becomes

Cov [t] = BCov [x]BT =(ATA

)1AT

(2I)A(ATA

)1(2.9)

=(ATA

)12AT IA

(ATA

)1(2.10)

=(ATA

)12

:

I(ATA

) (ATA

)1(2.11)

=(ATA

)12I (2.12)

The last multiplication by I preserves the matrix, so we have our result.

Problem 3: Exercise 1.3 - Profit Maximization of Business Investments

Solution: Let represent an vector of N investments whose total is bounded by some given amount, T .Mathematically,

=[t1 . . . tN

]T Nn=1

tn T . (3.1)

Then, we define as the N dimensional vector space of all possible investments

=

{[t1 . . . tN

]T| t1, . . . , tN R

}(3.2)


Suppose that a business firm is composed of M segments. Then, let s () be a deterministic vectorfunction where each component si () represents the amount of output produced by one segment of thebusiness firm, and let pi be a random vector where each component pii represents the profit from one unitof output from the corresponding sector. Note that

dim (s ()) = dim (pi) = M. (3.3)

Then, the we can write the total profit of the business T as the inner product of s () and pi:

T () = s () ,pi (3.4)The problem of interest is to find the optimal set of investments that maximize the mean total profit,E [T ()], subject to the constraint of bounded variance in T (). In the field of finance, this constraintcan be thought of as having some known bound to the investment risk. One would not want to define aninvestment portfolio that maximizes the expected total profit but have an infinite risk, since that couldmean the investment completely tanks.

Thus, mathematically, we have that L () is given by

L () = E [T ()] = E [s () ,pi] = s () ,E [pi] , (3.5)where the last equality follows from the linearity of the expectation operator. Since T () is a linearcombination of the pi components, its variance can be written as

Var (T ()) =n,m

snsmn,m = sTs, (3.6)

where we omitted writing the dependency of s to keep the equation cleaner and easier to read. Theoptimization problem is then written as follows:

Find all that maximizes L () such that sTs C. Then, the solution set is =

{ | , L () L () sT () s () C} . (3.7)

Now, we wish to explicitly define y (), which can be thought of as a measurement of L () or as theactual total profit you would see when you perform the investment of some . Above, we saw that the valueof L () is reliant on the mean of the pi segment profit vector. Thus, when we perform the investment, wecan expect (no pun intended) that our actual profit should be some small deviation from the mean of pi.Mathematically,

y () = s () ,E [pi] + = s () ,E [pi] L()

+ s () , ()

(3.8)

= L () + () (3.9)

Clearly, this shows that the measurement noise is dependent on ; moreover, this is not noise in thetraditional sense. It is random but is not caused by what most would assume as noised added on top of themeasurement. Rather, it is product of producing an observation of a probabilistic quantity. By the Lawof Large Numbers, we can probably reason that for many instances of investments, we would converge tomean profit, but since money is finite, repeated iterations may not always be possible.


Problem 4: Exercise 1.4 - Probability of Correct Optimal Solution

Solution: From the problem statement, suppose L (1) = 0 and L (2) = 1; then, we have

y (i) =

1;N(0,1) : i = 11 + 2;N(0,1) : i = 2 , (4.1)where i;N(0,1) are independent N (0, 1) distributed random noise variables. From this, we have that theprobability of incorrectly selecting 2 as the choice that minimizes y (i)

Pr {y (1) > y (2)} = Pr {y (1) y (2) > 0} (4.2)= Pr

{1;N(0,1) 1 2;N(0,1) > 0

}(4.3)

= Pr{1;N(0,1) 2;N(0,1) > 1

}(4.4)

Now, we make the following definition for the rest of the problem.

Definition 4.1. Let X be a normally distributed random variable with N (0, 1). Then we define (x)as the cumulative distribution function (CDF) of X:

Pr {X x} = (x) = 12pi

x

et2

2 dt. (4.5)

If X is distributed as N(, 2

), then the CDF of X is given by

(x

).

Furthermore, we will state, without proof, a lemma regarding the sums of mutually indepedent, normalrandom variables:

Lemma 4.1 Suppose that X and Y are mutually independent random variables distributed nor-mally as N

(X ,

2X

)and N

(Y ,

2Y

). Then,

Z = aX + bY = Z has distribution N (aX + bY , a22X + b22Y ) (4.6)The proof of Lemma 4.1 can easily be found using characteristic functions or moment generating functionsor referenced in any introductory text on probability and statistics. Returning to our problem, applyingLemma 4.1 to the difference of 1;N(0,1) and 2;N(0,1)

1;N(0,1) 2;N(0,1) = T ;N(0,2) (4.7)

Then, applying Definition 4.1, the answer to the problem is

Pr{T ;N(0,2) > 1

}= 1 Pr{T ;N(0,2) 1} = 1 ( 1

2

) 0.3085 (4.8)

The integral above is evaluated using the command 1 - CDF[NormalDistribution[0, 2], 1.] in Math-ematica 10.

EN625-743 HW1

Documents

Transcript of EN625-743 HW1