EN625-743 HW1

5
EN 625.743 - Stochastic Optimization and Control Johns Hopkins University 5 February 2015 Nam Nicholas Mai Homework Assignment1 Problem 1: Exercise A.1 - First and Second-order Taylor Approximations Solution: Following the the appendix, suppose that θ =[t 1 ,t 2 ] T and L (θ) is given by L (θ)= t 2 1 + t 2 2 + e -2t 1 -t 2 . (1.1) By definition, we have that g (θ) is given by g (θ)= ∂L θ = ∂L ∂t 1 ∂L ∂t 2 = 2t 1 - 2e -2t 1 -t 2 2t 2 - e -2t 1 -t 2 . (1.2) Likewise, the Hessian matrix H (θ) is given by H (θ)= 2 L θθ T = 2 L 2 t 1 2 L ∂t 1 ∂t 2 2 L ∂t 2 ∂t 1 2 L 2 t 2 = 2+4e -2t 1 -t 2 2e -2t 1 -t 2 2e -2t 1 -t 2 2+ e -2t 1 -t 2 . (1.3) To verify that these expressions are correct, two test points are chosen, θ 1 = [0.05, -0.05] T and θ 2 = [0.5, -0.5] T . The results follow in Table 1 below. MATLAB was used to compute these values. The code to this problem is given after the table. θ =[t 1 ,t 2 ] T First-order (A.1) Second-order (A.2) L (θ) [0.1, 0.1] T 0.700 0.765 0.761 [0.05, -0.05] T 0.950 0.956 0.956 [0.5, 0.5] T 0.500 1.125 1.107 Table 1: Comparison of first and second order Taylor approximations at [0, 0] T MATLAB Code Problem 1 1 %% Nam Nicholas Mai 2 % Johns Hopkins University - EP 3 % EN 625.743 - Stochastic Optimization and Control 4 % 5 February 2015 5 % 6 % Homework 1 - Course introduction 7 % Problem 1: Example A.1 - Evaluating Taylor Approximations to L(theta) 8 % 9 % Copyright, All Rights Reserved 10 11 clearvars;clc; 12

description

lkj;klj

Transcript of EN625-743 HW1

  • EN 625.743 - Stochastic Optimization and ControlJohns Hopkins University 5 February 2015Nam Nicholas Mai Homework Assignment1

    Problem 1: Exercise A.1 - First and Second-order Taylor Approximations

    Solution: Following the the appendix, suppose that = [t1, t2]T and L () is given by

    L () = t21 + t22 + e

    2t1t2 . (1.1)

    By definition, we have that g () is given by

    g () =L

    =

    Lt1Lt2

    =2t1 2e2t1t2

    2t2 e2t1t2

    . (1.2)Likewise, the Hessian matrix H () is given by

    H () =2L

    T=

    2L2t1 2Lt1t22Lt2t1

    2L2t2

    =2 + 4e2t1t2 2e2t1t2

    2e2t1t2 2 + e2t1t2

    . (1.3)To verify that these expressions are correct, two test points are chosen, 1 = [0.05,0.05]T and 2 =[0.5,0.5]T . The results follow in Table 1 below. MATLAB was used to compute these values. The codeto this problem is given after the table.

    = [t1, t2]T First-order

    (A.1)Second-order

    (A.2)L ()

    [0.1, 0.1]T 0.700 0.765 0.761

    [0.05,0.05]T 0.950 0.956 0.956[0.5, 0.5]T 0.500 1.125 1.107

    Table 1: Comparison of first and second order Taylor approximations at [0, 0]T

    MATLAB Code Problem 1

    1 %% Nam Nicholas Mai2 % Johns Hopkins University - EP3 % EN 625.743 - Stochastic Optimization and Control4 % 5 February 20155 %6 % Homework 1 - Course introduction7 % Problem 1: Example A.1 - Evaluating Taylor Approximations to L(theta)8 %9 % Copyright, All Rights Reserved

    10

    11 clearvars;clc;12

  • EN 625.743 - Homework Assignment1 2

    13 % Set the input variables, where T stands for the vector theta14 Tp = [0 0]'; % theta prime15 T1 = [0.05 -0.05]'; % 1st test point16 T2 = [0.5 -0.5]'; % 2nd test point17 T3 = [0.1 0.1]'; % 3rd test point, control from appendix A.118

    19 % Calculate the loss function L at Tp, T1, and T220 L Tp = Tp(1)2 + Tp(2)2 + exp(-2*Tp(1) - Tp(2));21 L T1 = T1(1)2 + T1(2)2 + exp(-2*T1(1) - T1(2));22 L T2 = T2(1)2 + T2(2)2 + exp(-2*T2(1) - T2(2));23 L T3 = T3(1)2 + T3(2)2 + exp(-2*T3(1) - T3(2));24

    25 % Calculate the gradient g at Tp26 grad Tp = [2*Tp(1)-2*exp(-2*Tp(1)-Tp(2)); ...27 2*Tp(2)-exp(-2*Tp(1)-Tp(2))];28

    29 % Calculate the Hessian matrix H at Tp30 Hessian Tp = [2+4*exp(-2*Tp(1)-Tp(2)) 2*exp(-2*Tp(1)-Tp(2)); ...31 2*exp(-2*Tp(1)-Tp(2)) 2+exp(-2*Tp(1)-Tp(2))];32

    33 % Calculate the approximation at the points T1 and T234 L T1 1st = L Tp + grad Tp'*(T1 - Tp);35 L T1 2nd = L Tp + grad Tp'*(T1 - Tp) + 0.5*(T1 - Tp)'*Hessian Tp*(T1-Tp);36

    37 L T2 1st = L Tp + grad Tp'*(T2 - Tp);38 L T2 2nd = L Tp + grad Tp'*(T2 - Tp) + 0.5*(T2 - Tp)'*Hessian Tp*(T2-Tp);39

    40 L T3 1st = L Tp + grad Tp'*(T3 - Tp);41 L T3 2nd = L Tp + grad Tp'*(T3 - Tp) + 0.5*(T3 - Tp)'*Hessian Tp*(T3-Tp);42

    43 % Output the results as matrix44 format short;45 L approx = round([L T1 1st L T1 2nd L T1; ...46 L T2 1st L T2 2nd L T2; ...47 L T3 1st L T3 2nd L T3].*1000) ./ 1000

    Problem 2: Exercise A.7 - Let t =(ATA

    )1ATx, where x is a random vector with Cov [x] = 2I,

    where is a scalar, and A is a non-random matrix such that the indicated inverse exists. Show thatCov [t] =

    (ATA

    )12.

    Solution: The gist of this problem is to show the mathematical properties of the covariance matrix undermatrix multiplication. To simplify the problem, let us prove the generalized case as a lemma.

    Lemma 2.1 Suppose x is a random vector and A be a non-random matrix. Then,

    Cov [Ax] = ACov [x]AT (2.1)

  • EN 625.743 - Homework Assignment1 3

    Proof. Recall the definition of the covariance of a random vector,

    Cov [x] = E[(x E [x]) (x E [x])T

    ]. (2.2)

    Then, Equation 2.1 becomes

    Cov [Ax] = E[(Ax E [Ax]) (Ax E [Ax])T

    ](2.3)

    = E[A (x E [x]) (x E [x])T AT

    ](2.4)

    = AE[(x E [x]) (x E [x])T

    ]AT (2.5)

    = ACov [x]AT (2.6)

    The equalities above come from expectation of a constant and properties of the transpose.

    To the end of proving the problem statement, define B =(ATA

    )1AT . Then, we have

    Cov [t] = Cov[(ATA

    )1ATx

    ]= Cov [Bx] = BCov [x]BT , (2.7)

    where the last equality follows from the direct application of Lemma 2.1. Then, by properties of thetranspose

    BT =[(ATA

    )1AT]T

    = A[(ATA

    )1]T= A

    [(ATA

    )T ]1= A

    (ATA

    )1. (2.8)

    Finally, Equation 2.7 becomes

    Cov [t] = BCov [x]BT =(ATA

    )1AT

    (2I)A(ATA

    )1(2.9)

    =(ATA

    )12AT IA

    (ATA

    )1(2.10)

    =(ATA

    )12

    :

    I(ATA

    ) (ATA

    )1(2.11)

    =(ATA

    )12I (2.12)

    The last multiplication by I preserves the matrix, so we have our result.

    Problem 3: Exercise 1.3 - Profit Maximization of Business Investments

    Solution: Let represent an vector of N investments whose total is bounded by some given amount, T .Mathematically,

    =[t1 . . . tN

    ]T Nn=1

    tn T . (3.1)

    Then, we define as the N dimensional vector space of all possible investments

    =

    {[t1 . . . tN

    ]T| t1, . . . , tN R

    }(3.2)

  • EN 625.743 - Homework Assignment1 4

    Suppose that a business firm is composed of M segments. Then, let s () be a deterministic vectorfunction where each component si () represents the amount of output produced by one segment of thebusiness firm, and let pi be a random vector where each component pii represents the profit from one unitof output from the corresponding sector. Note that

    dim (s ()) = dim (pi) = M. (3.3)

    Then, the we can write the total profit of the business T as the inner product of s () and pi:

    T () = s () ,pi (3.4)The problem of interest is to find the optimal set of investments that maximize the mean total profit,E [T ()], subject to the constraint of bounded variance in T (). In the field of finance, this constraintcan be thought of as having some known bound to the investment risk. One would not want to define aninvestment portfolio that maximizes the expected total profit but have an infinite risk, since that couldmean the investment completely tanks.

    Thus, mathematically, we have that L () is given by

    L () = E [T ()] = E [s () ,pi] = s () ,E [pi] , (3.5)where the last equality follows from the linearity of the expectation operator. Since T () is a linearcombination of the pi components, its variance can be written as

    Var (T ()) =n,m

    snsmn,m = sTs, (3.6)

    where we omitted writing the dependency of s to keep the equation cleaner and easier to read. Theoptimization problem is then written as follows:

    Find all that maximizes L () such that sTs C. Then, the solution set is =

    { | , L () L () sT () s () C} . (3.7)

    Now, we wish to explicitly define y (), which can be thought of as a measurement of L () or as theactual total profit you would see when you perform the investment of some . Above, we saw that the valueof L () is reliant on the mean of the pi segment profit vector. Thus, when we perform the investment, wecan expect (no pun intended) that our actual profit should be some small deviation from the mean of pi.Mathematically,

    y () = s () ,E [pi] + = s () ,E [pi] L()

    + s () , ()

    (3.8)

    = L () + () (3.9)

    Clearly, this shows that the measurement noise is dependent on ; moreover, this is not noise in thetraditional sense. It is random but is not caused by what most would assume as noised added on top of themeasurement. Rather, it is product of producing an observation of a probabilistic quantity. By the Lawof Large Numbers, we can probably reason that for many instances of investments, we would converge tomean profit, but since money is finite, repeated iterations may not always be possible.

  • EN 625.743 - Homework Assignment1 5

    Problem 4: Exercise 1.4 - Probability of Correct Optimal Solution

    Solution: From the problem statement, suppose L (1) = 0 and L (2) = 1; then, we have

    y (i) =

    1;N(0,1) : i = 11 + 2;N(0,1) : i = 2 , (4.1)where i;N(0,1) are independent N (0, 1) distributed random noise variables. From this, we have that theprobability of incorrectly selecting 2 as the choice that minimizes y (i)

    Pr {y (1) > y (2)} = Pr {y (1) y (2) > 0} (4.2)= Pr

    {1;N(0,1) 1 2;N(0,1) > 0

    }(4.3)

    = Pr{1;N(0,1) 2;N(0,1) > 1

    }(4.4)

    Now, we make the following definition for the rest of the problem.

    Definition 4.1. Let X be a normally distributed random variable with N (0, 1). Then we define (x)as the cumulative distribution function (CDF) of X:

    Pr {X x} = (x) = 12pi

    x

    et2

    2 dt. (4.5)

    If X is distributed as N(, 2

    ), then the CDF of X is given by

    (x

    ).

    Furthermore, we will state, without proof, a lemma regarding the sums of mutually indepedent, normalrandom variables:

    Lemma 4.1 Suppose that X and Y are mutually independent random variables distributed nor-mally as N

    (X ,

    2X

    )and N

    (Y ,

    2Y

    ). Then,

    Z = aX + bY = Z has distribution N (aX + bY , a22X + b22Y ) (4.6)The proof of Lemma 4.1 can easily be found using characteristic functions or moment generating functionsor referenced in any introductory text on probability and statistics. Returning to our problem, applyingLemma 4.1 to the difference of 1;N(0,1) and 2;N(0,1)

    1;N(0,1) 2;N(0,1) = T ;N(0,2) (4.7)

    Then, applying Definition 4.1, the answer to the problem is

    Pr{T ;N(0,2) > 1

    }= 1 Pr{T ;N(0,2) 1} = 1 ( 1

    2

    ) 0.3085 (4.8)

    The integral above is evaluated using the command 1 - CDF[NormalDistribution[0, 2], 1.] in Math-ematica 10.