Homework 2 - ee104.stanford.eduee104.stanford.edu/homework/hw2.pdf · EE104, Spring 2017-2018 S....

2

Click here to load reader

Transcript of Homework 2 - ee104.stanford.eduee104.stanford.edu/homework/hw2.pdf · EE104, Spring 2017-2018 S....

Page 1: Homework 2 - ee104.stanford.eduee104.stanford.edu/homework/hw2.pdf · EE104, Spring 2017-2018 S. Boyd & S. Lall Homework 2 1. Regularized least squares and features. The following

EE104, Spring 2017-2018 S. Boyd & S. Lall

Homework 2

1. Regularized least squares and features. The following problem will use U, and v foundin prostate_cancer_data.json. For the problem below, split the data using a 70-30train-test split.

(a) Explain how to formulate the problem of fitting regularized least squares givena matrix U and regularization parameter λ > 0, where the first feature is theconstant feature x1 = 1.

(b) Standardize all of the features and then fit a model to the data, adding only aconstant feature. Sweep your regularization parameter λ over the range [10−5, 105]and plot the corresponding training and test errors. Choose an appropriate valuefor λ, i.e., the largest value that achieves approximately minimum test error. Givethe model, and the corresponding test error.

(c) If you look into the data matrix U , you’ll notice that the last two columns actuallytake on only a few values. Embed both columns using a one-hot encoding, keepingthe rest of the values the same. Now do the same sweep you did in part (b), andplot the corresponding training and testing errors. Compare the final test RMSEof the one-hot encoding vs. the original encoding, with appropriately chosen λfor each.

Hint. You can make use of the Julia file to_one_hot.jl, which contains thefunction to_one_hot(u). This function takes as input an n-vector u whose entriesare one of k categories and embeds it using a one-hot embedding into Rn×k.

2. Some properties of one-hot embeddings. Suppose u is a categorical that can take kvalues, i.e., U = {1, . . . , k}. The one-hot embedding of u into Rk is defined as φ(u) = eu,where ej is the jth unit vector.

We have a data set with n observations, that includes the categorical u as a field. Thefirst feature is the constant feature x1 = 1.

(a) Show that the data matrix X always has linearly dependent columns. This meansthat we cannot use least squares to fit a regression model, when we use one-hot embedding of a categorical. Hint. Consider the sum of the columns in Xassociated with u.

(b) Now suppose that we add quadratic regularization on θ2:k+1 to our fitting method.(We do not regularize the model coefficient associated with the constant feature.)Show that the sum of the coefficients θi associated with u is zero. Hint. Usethe orthogonality principle on page 231 of VMLS, applied to the least squaresproblem that determines θ.

1

Page 2: Homework 2 - ee104.stanford.eduee104.stanford.edu/homework/hw2.pdf · EE104, Spring 2017-2018 S. Boyd & S. Lall Homework 2 1. Regularized least squares and features. The following

(c) Verify the properties in parts (a) and (b) numerically, with a small problem witha constant feature and a single categorical u that takes three values. You can usea data set with n = 10 samples. Generate random ui ∈ {1, 2, 3} and yi ∈ R andthen form the data matrix X. Verify that the columns are dependent. Then findthe model coefficients using quadratic regularization. Verify that the sum of thethree model coefficients associated with u (i.e., θ2 + θ3 + θ4) is zero.

3. Standardizing Boolean features. Problem 13.12 in VMLS (Introduction to Applied Lin-ear Algebra — Vectors, Matrices, and Least Squares).

4. Sequential outlier removal. Throughout this problem, you’ll use the data U, v, found infitting_outliers.json. Here, U ∈ Rn×1, so there is only one (nonconstant) feature.This one feature is already (nearly) standardized, so you do not need to standardizeit. The data matrix X will have two columns, the constant feature one and the featuregiven in U . Also, there is enough data that you do not need to use any regularization.

(a) Fit a least squares model to the dataset above and plot the data points andstraight-line fit. Describe what you observe.

(b) Sequential outlier removal. Find the data point with the largest loss and labelit as an outlier. Remove this point from your data set and fit the model againto this new dataset (which has one fewer data point). Continue doing this untilyour θ stops changing too much (say, the change between the components of theprevious θ and the current one is no more than .01).

Show a few of the intermediate fits and the final fit plotted against the data points.Describe what you observe.

5. All-pairs interactions. The following problem will use U, and v found in all_pairs_data.json.The data has U ∈ Rn×3. Throughout this problem, use a 50-50 train/test split.

(a) Fit a linear least-squares model directly to the data matrix, with the first featurebeing a constant feature x1 = 1. Since we’ve given you enough data and the datais approximately standardized, you do not have to worry about regularization orstandardization. Report the train and test RMSE of this predictor.

(b) Create an embedding which includes all of the interactions (products) betweenevery pair of distinct variables, along with a constant feature and the variablesthemselves. For example, if u ∈ R2, then the embedding should be

φ(u) = (1, u1, u2, u1u2).

Report the train and test RMSE of this predictor. Compare it with the RMSEsyou got in (a).

2