CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review...

33
CS229 Midterm Review Part II Taide Ding November 1, 2019 1 / 33

Transcript of CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review...

Page 1: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

CS229 Midterm Review Part II

Taide Ding

November 1, 2019

1 / 33

Page 2: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Overview

1 Past Midterm Stats

2 Helpful Resources

3 Notation: quick clarifying review

4 Another perspective on bias-variance

5 Common Problem-solving Strategies (with examples)

2 / 33

Page 3: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

The Midterms are tough - DON’T PANIC!

Fall 16 Midterm Grade distribution

Fall 17: µ = 39.5, σ = 14.5Spring 19: µ = 65.4, σ = 22.4

3 / 33

Page 4: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Helpful Resources

Study guide by past CS229 TA Shervine Amidi (link is on coursesyllabus)

https://stanford.edu/∼shervine/teaching/cs-229/

4 / 33

Page 5: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Helpful Resources

IMPORTANT: CS229 Linear Algebra and Probability Review handouts

Go over them carefully and in detail.

Any and all of the concepts/tools within are fair game w.r.t. solvingmidterm problems

TAKE NOTES

5 / 33

Page 6: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

6 / 33

Page 7: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.

The j-th element (i.e. feature) of the i-th sample is denoted x(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

7 / 33

Page 8: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

8 / 33

Page 9: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

9 / 33

Page 10: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

10 / 33

Page 11: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

11 / 33

Page 12: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Notation: quick clarifying review

{x (i), y (i)}ni=1 denotes a dataset of n examples. For each example i ,x (i) ∈ Rd , and y (i) ∈ R.The j-th element (i.e. feature) of the i-th sample is denoted x

(i)j .

X ∈ Rn×d is the data matrix and ~y ∈ Rn is the label vector such that:

x (i) =

x(i)1...

x(i)d

,X =

− x (1)T −

−... −

− x (n)T −

, ~y =

y(1)

...

y (n)

,Xθ =θ

T x (1)

...

θT x (n)

for parameter vector θ ∈ Rd .

The t-th iteration of θ is denoted θ(t).

Superscripts: sample index i ∈ [1, n]; iteration index t ∈ [1,T ]

Subscripts: feature index j ∈ [1, d ]

12 / 33

Page 13: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

13 / 33

Page 14: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

14 / 33

Page 15: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model class

f ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

15 / 33

Page 16: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)

g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

16 / 33

Page 17: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model class

f̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

17 / 33

Page 18: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.

approximation error → biasreduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

18 / 33

Page 19: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

19 / 33

Page 20: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

20 / 33

Page 21: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variance

reduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)

21 / 33

Page 22: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Another perspective on bias-variance

F is your model classf ∗ is optimal model for problem (or the true generating distribution)g is the optimal model in your model classf̂ is the model you obtain through learning on your dataset.approximation error → bias

reduce bias by expanding F (e.g. more features, more layers) ormoving F closer to optimal model f ∗ (i.e. choosing a better class)

estimation error → variancereduce variance by contracting F (e.g. remove features, regularize) or

making ~f closer to g (e.g. better training algo, more data)22 / 33

Page 23: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal1 Probability

Bayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

23 / 33

Page 24: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal

1 ProbabilityBayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

24 / 33

Page 25: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal1 Probability

Bayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

25 / 33

Page 26: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal1 Probability

Bayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

26 / 33

Page 27: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal1 Probability

Bayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

27 / 33

Page 28: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Common Problem-solving Strategies

Take stock of your arsenal1 Probability

Bayes’ RuleIndependence, Conditional IndependenceChain Ruleetc.

2 Calculus (e.g. taking gradients)Maximum likelihood estimations:

`(.) = logL(.) = log∏

p(.) =∑

log p(.)

Loss minimizationetc.

3 Linear AlgebraPSD, eigendecomposition, projection, Mercer’s Theorem etc.

4 Proof techniquesconstruction, contradiction (e.g. counterexample), induction,contrapositive, etc.

28 / 33

Page 29: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Spring 19 Problem 3(a,b) - Exponential Discr. Analysis

Tools used: Probability (Bayes’, Indep, Chain Rule), Calculus (MLE)

29 / 33

Page 30: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Spring 19 Problem 3(a,b) - Exponential Discr. Analysis

Tools used: Probability (Bayes’, Indep, Chain Rule), Calculus (MLE)30 / 33

Page 31: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Spring 19 Problem 5(a) - Kernel Fun

Tools used: Linear Algebra (PSD properties, eigendecomposition), proofby construction

31 / 33

Page 32: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Summary

1 The midterm is tough. Don’t panic!

2 Use resources - study guide, lecture and review handouts, Piazza, OH

3 Know your problem-solving tools - take stock of your arsenal!

32 / 33

Page 33: CS229 Midterm Review Part IIcs229.stanford.edu/materials/midterm-review.pdf · CS229 Midterm Review Part II Taide Ding November 1, 2019 1/33. Overview 1 Past Midterm Stats 2 Helpful

Best of Luck!

33 / 33