Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released!...

67
Neural Networks I Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Transcript of Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released!...

Page 1: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks I

Jia-Bin Huang

Virginia Tech Spring 2019ECE-5424G / CS-5824

Page 2: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Administrative

• HW 2 released!

• Jia-Bin out of town next week

• Chen: Deep Neural Networks II

• Shih-Yang: Diagnosing ML systems

• Midterm review

• Midterm: March 6th (Wed)

Page 3: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Hard-margin SVM formulationmin𝜃

1

2σ𝑗=1𝑛 𝜃𝑗

2

s. t. 𝜃⊤𝑥 𝑖 ≥ 1 if y 𝑖 = 1𝜃⊤𝑥 𝑖 ≤ −1 if y 𝑖 = 0

Soft-margin SVM formulationmin

𝜃, {𝜉(𝑖)}

1

2σ𝑗=1𝑛 𝜃𝑗

2 + 𝐶 σ𝑖 𝜉(𝑖)

s. t. 𝜃⊤𝑥 𝑖 ≥ 1 − 𝜉(𝑖) if y 𝑖 = 1𝜃⊤𝑥 𝑖 ≤ −1 + 𝜉 𝑖 if y 𝑖 = 0

𝜉 𝑖 ≥ 0 ∀ 𝑖

Page 4: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Support Vector Machine

•Cost function

• Large margin classification

•Kernels

•Using an SVM

Page 5: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Non-linear classification

• How do we separate the two classes using a hyperplane?

ℎ𝜃 𝑥 = ቊ 1 if 𝜃⊤𝑥 ≥ 00 if 𝜃⊤𝑥 < 0

𝑥

Page 6: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Non-linear classification

𝑥

𝑥2

Page 7: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Kernel

•𝐾 ⋅,⋅ a legal definition of inner product:

∃𝜙: 𝑋 → 𝑅𝑁

s.t. 𝐾 𝑥, 𝑧 = 𝜙 𝑥 ⊤𝜙(𝑧)

Page 8: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Why Kernels matter?

• Many algorithms interact with data only via dot-products

• Replace 𝑥⊤𝑧 with 𝐾 𝑥, 𝑧 = 𝜙 𝑥 ⊤𝜙(𝑧)

• Act implicitly as if data was in the higher-dimensional 𝜙-space

Page 9: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Example

𝐾 𝑥, 𝑧 = 𝑥⊤𝑧 2 corresponds to

𝑥1, 𝑥2 → 𝜙 𝑥 = 𝑥12, 𝑥2

2, 2𝑥1𝑥2

𝜙 𝑥 ⊤𝜙 𝑧 = 𝑥12, 𝑥2

2, 2𝑥1𝑥2 𝑧12, 𝑧2

2, 2𝑧1𝑧2⊤

= 𝑥12𝑧1

2 + 𝑥22𝑧2

2 + 2x1x2z1z2 = x1z1 + x2z22

= 𝑥⊤𝑧 2 = 𝐾(𝑥, 𝑧)

Page 10: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Example

𝐾 𝑥, 𝑧 = 𝑥⊤𝑧 2 corresponds to

𝑥1, 𝑥2 → 𝜙 𝑥 = 𝑥12, 𝑥2

2, 2𝑥1𝑥2

Slide credit: Maria-Florina Balcan

Page 11: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Example kernels

• Linear kernel𝐾 𝑥, 𝑧 = 𝑥⊤𝑧

•Gaussian (Radial basis function) kernel

𝐾 𝑥, 𝑧 = exp −1

2𝑥 − 𝑧 ⊤Σ−1(𝑥 − 𝑧)

• Sigmoid kernel𝐾 𝑥, 𝑧 = tanh 𝑎 ⋅ 𝑥⊤𝑧 + 𝑏

Page 12: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Constructing new kernels

• Positive scaling𝐾 𝑥, 𝑧 = 𝑐𝐾1 𝑥, 𝑧 , 𝑐 > 0

• Exponentiation𝐾 𝑥, 𝑧 = exp 𝐾1 𝑥, 𝑧

• Addition𝐾 𝑥, 𝑧 = 𝐾1 𝑥, 𝑧 + 𝐾2 𝑥, 𝑧

• Multiplication with function𝐾 𝑥, 𝑧 = 𝑓 𝑥 𝐾1 𝑥, 𝑧 𝑓(𝑧)

• Multiplication𝐾 𝑥, 𝑧 = 𝐾1 𝑥, 𝑧 𝐾2 𝑥, 𝑧

Page 13: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Non-linear decision boundary

Predict 𝑦 = 1 if

𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + 𝜃3𝑥1𝑥2+ 𝜃4𝑥1

2 + 𝜃5𝑥22 +⋯ ≥ 0

𝜃0 + 𝜃1𝑓1 + 𝜃2𝑓2 + 𝜃3𝑓3 +⋯

𝑓1 = 𝑥1, 𝑓2 = 𝑥2, 𝑓3 = 𝑥1𝑥2, ⋯

Is there a different/better choice of the features 𝑓1, 𝑓2, 𝑓3, ⋯?

𝑥1

𝑥2

Slide credit: Andrew Ng

Page 14: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Kernel

Give 𝑥, compute new features depending on proximity to landmarks 𝑙(1), 𝑙(2), 𝑙(3)

𝑓1 = similarity(𝑥, 𝑙(1))𝑓2 = similarity(𝑥, 𝑙(2))𝑓3 = similarity(𝑥, 𝑙(3))

Gaussian kernel

similarity 𝑥, 𝑙 𝑖 = exp(−𝑥 − 𝑙 𝑖

2

2𝜎2)

𝑥1

𝑥2𝑙(2)

𝑙(3)

𝑙(1)

Slide credit: Andrew Ng

Page 15: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Predict 𝑦 = 1 if

𝜃0 + 𝜃1𝑓1 + 𝜃2𝑓2 + 𝜃3𝑓3 ≥ 0

Ex: 𝜃0 = −0.5, 𝜃1 = 1, 𝜃2 = 1, 𝜃3 = 0

𝑥1

𝑥2𝑙(2)

𝑙(3)

𝑙(1)

𝑓1 = similarity(𝑥, 𝑙(1))𝑓2 = similarity(𝑥, 𝑙(2))

𝑓3 = similarity(𝑥, 𝑙(3))

Slide credit: Andrew Ng

Page 16: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Choosing the landmarks

• Given 𝑥

𝑓𝑖 = similarity 𝑥, 𝑙 𝑖 = exp(−𝑥 − 𝑙 𝑖

2

2𝜎2)

Predict 𝑦 = 1 if 𝜃0 + 𝜃1𝑓1 + 𝜃2𝑓2 + 𝜃3𝑓3 ≥ 0

Where to get 𝑙(1), 𝑙(2), 𝑙 3 , ⋯ ?

𝑥1

𝑥2𝑙 1

𝑙 2

𝑙 3

𝑙 𝑚

Slide credit: Andrew Ng

Page 17: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

SVM with kernels

• Given 𝑥 1 , 𝑦 1 , 𝑥 2 , 𝑦 2 , ⋯ , 𝑥 𝑚 , 𝑦 𝑚

• Choose 𝑙 1 = 𝑥 1 , 𝑙 2 = 𝑥 2 , 𝑙 3 = 𝑥 3 , ⋯, 𝑙 𝑚 = 𝑥 𝑚

• Given example 𝑥:• 𝑓1 = similarity 𝑥, 𝑙(1)

• 𝑓2 = similarity 𝑥, 𝑙(2)

• ⋯

• For training example 𝑥 𝑖 , 𝑦 𝑖 :

• 𝑥 𝑖 → 𝑓(𝑖)

𝑓 =

𝑓0𝑓1𝑓2⋮𝑓𝑚

Slide credit: Andrew Ng

Page 18: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

• Hypothesis: Given 𝑥, compute features 𝑓 ∈ ℝ𝑚+1

• Predict 𝑦 = 1 if 𝜃⊤𝑓 ≥ 0

• Training (original)

min𝜃

𝐶

𝑖=1

𝑚

𝑦(𝑖) cost1 𝜃⊤𝑥 𝑖 + (1 − 𝑦 𝑖 ) cost0 𝜃⊤𝑥 𝑖 +1

2

𝑗=1

𝑛

𝜃𝑗2

• Training (with kernel)

min𝜃

𝐶

𝑖=1

𝑚

𝑦(𝑖) cost1 𝜃⊤𝑓 𝑖 + (1 − 𝑦 𝑖 ) cost0 𝜃⊤𝑓 𝑖 +1

2

𝑗=1

𝑚

𝜃𝑗2

SVM with kernels

Page 19: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Support vector machines (Primal/Dual)

•Primal form

• Lagrangian dual form

min𝜃,𝜉(𝑖)

1

2σ𝑗=1𝑛 𝜃𝑗

2 + 𝐶 σ𝑖 𝜉(𝑖)

s. t. 𝜃⊤𝑥 𝑖 ≥ 1 − 𝜉(𝑖) if y 𝑖 = 1𝜃⊤𝑥 𝑖 ≤ −1 + 𝜉 𝑖 if y 𝑖 = 0

𝜉 𝑖 ≥ 0 ∀ 𝑖

min𝛼

1

2

𝑖

𝑗

𝑦(𝑖)𝑦(𝑗)𝛼(𝑖)𝛼(𝑗)𝑥 𝑖 ⊤𝑥(𝑗) −

𝑖

𝛼(𝑖)

s. t. 0 ≤ 𝛼(𝑖) ≤ 𝐶𝑖

𝑖

𝑦(𝑖)𝛼(𝑖) = 0

Page 20: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

SVM (Lagrangian dual)

min𝛼

1

2

𝑖

𝑗

𝑦(𝑖)𝑦(𝑗)𝛼(𝑖)𝛼(𝑗)𝑥 𝑖 ⊤𝑥(𝑗) −

𝑖

𝛼(𝑖)

s. t. 0 ≤ 𝛼(𝑖) ≤ 𝐶𝑖

𝑖

𝑦(𝑖)𝛼(𝑖) = 0

Classifier: 𝜃 = σ𝑖 𝛼(𝑖) 𝑦 𝑖 𝑥(𝑖)

• The points 𝑥(𝑖) for which 𝛼(𝑖) ≠ 0 → Support Vectors

Replace 𝑥 𝑖 ⊤𝑥(𝑗) with 𝐾(𝑥 𝑖 , 𝑥(𝑗))

Page 21: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

SVM parameters

• 𝐶 =1

𝜆Large 𝐶: Lower bias, high variance. Small 𝐶: Higher bias, low variance.

• 𝜎2

• Large 𝜎2: features 𝑓𝑖 vary more smoothly.• Higher bias, lower variance

• Small 𝜎2: features 𝑓𝑖 vary less smoothly.• Lower bias, higher variance

Slide credit: Andrew Ng

Page 22: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

SVM Demo

• https://cs.stanford.edu/people/karpathy/svmjs/demo/

Page 23: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

SVM song

Video source:

• https://www.youtube.com/watch?v=g15bqtyidZs

Page 24: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Support Vector Machine

•Cost function

• Large margin classification

•Kernels

•Using an SVM

Page 25: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Using SVM

• SVM software package (e.g., liblinear, libsvm) to solve for 𝜃

• Need to specify:• Choice of parameter 𝐶.

• Choice of kernel (similarity function):

• Linear kernel: Predict 𝑦 = 1 if 𝜃⊤𝑥 ≥ 0

• Gaussian kernel:

• 𝑓𝑖 = exp(−𝑥−𝑙 𝑖

2

2𝜎2), where 𝑙(𝑖) = 𝑥(𝑖)

• Need to choose 𝜎2. Need proper feature scalingSlide credit: Andrew Ng

Page 26: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Kernel (similarity) functions

• Note: not all similarity functions make valid kernels.

• Many off-the-shelf kernels available:• Polynomial kernel

• String kernel

• Chi-square kernel

• Histogram intersection kernel

Slide credit: Andrew Ng

Page 27: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Multi-class classification

• Use one-vs.-all method. Train 𝐾 SVMs, one to distinguish 𝑦 = 𝑖 from the rest, get 𝜃 1 , 𝜃(2), ⋯ , 𝜃(𝐾)

• Pick class 𝑖 with the largest 𝜃 𝑖 ⊤𝑥

𝑥2

𝑥1

Slide credit: Andrew Ng

Page 28: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Logistic regression vs. SVMs

• 𝑛 = number of features (𝑥 ∈ ℝ𝑛+1), 𝑚 = number of training examples

1. If 𝒏 is large (relative to 𝒎): (𝑛 = 10,000,m = 10 − 1000)→ Use logistic regression or SVM without a kernel (“linear kernel”)

2. If 𝒏 is small, 𝒎 is intermediate: (𝑛 = 1 − 1000,m = 10 − 10,000)→ Use SVM with Gaussian kernel

3. If 𝒏 is small, 𝒎 is large: (𝑛 = 1 − 1000,m = 50,000+)→ Create/add more features, then use logistic regression of linear SVM

Neural network likely to work well for most of these case, but slower to train Slide credit: Andrew Ng

Page 29: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Things to remember

• Cost function

min𝜃

𝐶

𝑖=1

𝑚

𝑦(𝑖) cost1 𝜃⊤𝑓 𝑖 + (1 − 𝑦 𝑖 ) cost0 𝜃⊤𝑓 𝑖 +1

2

𝑗=1

𝑚

𝜃𝑗2

• Large margin classification

• Kernels

• Using an SVM

𝑥2

𝑥1

margin

Page 30: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

Page 31: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

Page 32: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Non-linear classification

Predict 𝑦 = 1 if

𝑔(𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + 𝜃3𝑥1𝑥2+ 𝜃4𝑥1

2 + 𝜃5𝑥22 +⋯) ≥ 0

𝑥1= size

𝑥1= #bedrooms

𝑥1= #floors

𝑥1= age

…𝑥100

𝑥1

𝑥2

Slide credit: Andrew Ng

# Quadratic features?

# Cubic features?

Page 33: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

What humans see

Slide credit: Larry Zitnick

Page 34: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

What computers see243 239 240 225 206 185 188 218 211 206 216 225

242 239 218 110 67 31 34 152 213 206 208 221

243 242 123 58 94 82 132 77 108 208 208 215

235 217 115 212 243 236 247 139 91 209 208 211

233 208 131 222 219 226 196 114 74 208 213 214

232 217 131 116 77 150 69 56 52 201 228 223

232 232 182 186 184 179 159 123 93 232 235 235

232 236 201 154 216 133 129 81 175 252 241 240

235 238 230 128 172 138 65 63 234 249 241 245

237 236 247 143 59 78 10 94 255 248 247 251

234 237 245 193 55 33 115 144 213 255 253 251

248 245 161 128 149 109 138 65 47 156 239 255

190 107 39 102 94 73 114 58 17 7 51 137

23 32 33 148 168 203 179 43 27 17 12 8

17 26 12 160 255 255 109 22 26 19 35 24

Slide credit: Larry Zitnick

Page 35: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Computer Vision: Car detection

Testing:

Slide credit: Andrew Ng

Page 36: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

50x50 pixel images -> 2500 pixels

𝑥 =

pixel 1 intensitypixel 2 intensity

⋮pixel 2500 intensity

Quadratic features (𝑥𝑖𝑥𝑗)~3 million features Slide credit: Andrew Ng

Page 37: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

• Origins: Algorithms that try to mimic the brain.

• Was very widely used in 80s and early 90s; popularity diminished in late 90s.

• Recent resurgence: State-of-the-art technique for many applications

Slide credit: Andrew Ng

Page 38: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

https://www.slideshare.net/dlavenda/ai-and-productivity

Page 39: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

The “one learning algorithm” hypothesis

[Roe et al. 1992] Slide credit: Andrew Ng

Page 40: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

The “one learning algorithm” hypothesis

[Metin and Frost 1989] Slide credit: Andrew Ng

Page 41: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Sensor representations in the brain

Slide credit: Andrew Ng

Page 42: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

Page 43: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

A single neuron in the brain

Input

OutputSlide credit: Andrew Ng

Page 44: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

http://histoweb.co.za/082/082img002.html

Page 45: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

An artificial neuron: Logistic unit

𝑥 =

𝑥0𝑥1𝑥2𝑥3

𝜃 =

𝜃0𝜃1𝜃2𝜃3

• Sigmoid (logistic) activation function

𝑥1

𝑥2

𝑥3

𝑥0

ℎ𝜃 𝑥 =1

1 + 𝑒−𝜃⊤𝑥

“Bias unit”

“Input”

“Output”

“Weights” “Parameters”

Slide credit: Andrew Ng

Page 46: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural network

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

Layer 1

“Output”

𝑥1

𝑥2

𝑥3

𝑥0

Layer 2 Layer 3 Slide credit: Andrew Ng

Page 47: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural network

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

𝑥1

𝑥2

𝑥3

𝑥0𝑎𝑖(𝑗)

= “activation” of unit 𝑖 in layer 𝑗

Θ 𝑗 = matrix of weights controlling function mapping from layer 𝑗 to layer 𝑗 + 1

𝑎1(2)

= 𝑔 Θ10(1)𝑥0 + Θ11

(1)𝑥1 + Θ12

(1)𝑥2 + Θ13

(1)𝑥3

𝑎2(2)

= 𝑔 Θ20(1)𝑥0 + Θ21

(1)𝑥1 + Θ22

(1)𝑥2 + Θ23

(1)𝑥3

𝑎3(2)

= 𝑔 Θ30(1)𝑥0 + Θ31

(1)𝑥1 + Θ32

(1)𝑥2 + Θ33

(1)𝑥3

ℎΘ(𝑥) = 𝑔 Θ10(2)𝑎0(2)

+ Θ11(1)𝑎1(2)

+ Θ12(1)𝑎2(2)

+ Θ13(1)𝑎3(2)

𝑠𝑗 unit in layer 𝑗

𝑠𝑗+1 units in layer 𝑗 + 1

Size of Θ 𝑗 ?

𝑠𝑗+1 × (𝑠𝑗 + 1)

Slide credit: Andrew Ng

Page 48: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural network

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

𝑥1

𝑥2

𝑥3

𝑥0

𝑥 =

𝑥0𝑥1𝑥2𝑥3

z(2) =

z1(2)

z2(2)

z3(2)

𝑎1(2)

= 𝑔 Θ10(1)𝑥0 + Θ11

(1)𝑥1 + Θ12

(1)𝑥2 + Θ13

(1)𝑥3 = 𝑔(z1

(2))

𝑎2(2)

= 𝑔 Θ20(1)𝑥0 + Θ21

(1)𝑥1 + Θ22

(1)𝑥2 + Θ23

(1)𝑥3 = 𝑔(z2

(2))

𝑎3(2)

= 𝑔 Θ30(1)𝑥0 + Θ31

(1)𝑥1 + Θ32

(1)𝑥2 + Θ33

(1)𝑥3 = 𝑔(z3

(2))

ℎΘ 𝑥 = 𝑔 Θ102𝑎0

2+ Θ11

1𝑎1

2+ Θ12

1𝑎2

2+ Θ13

1𝑎3

2= 𝑔(𝑧(3))

“Pre-activation”

Slide credit: Andrew Ng

Page 49: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural network

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

𝑥1

𝑥2

𝑥3

𝑥0

𝑥 =

𝑥0𝑥1𝑥2𝑥3

z(2) =

z1(2)

z2(2)

z3(2)

𝑎1(2)

= 𝑔(z1(2))

𝑎2(2)

= 𝑔(z2(2))

𝑎3(2)

= 𝑔(z3(2))

ℎΘ 𝑥 = 𝑔(𝑧(3))

“Pre-activation”

𝑧(2) = Θ(1)𝑥 = Θ(1)𝑎(1)

𝑎(2) = 𝑔(𝑧(2))

Add 𝑎0(2)

= 1

𝑧(3) = Θ(2)𝑎(2)

ℎΘ 𝑥 = 𝑎(3) = 𝑔(𝑧(3))Slide credit: Andrew Ng

Page 50: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural network learning its own features

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

𝑥1

𝑥2

𝑥3

𝑥0

Slide credit: Andrew Ng

Page 51: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Other network architectures

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

𝑥1

𝑥2

𝑥3

𝑥0

𝑎1(3)

𝑎2(3)

𝑎0(3)

Slide credit: Andrew Ng

Page 52: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

• Architecture

Page 53: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing
Page 54: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

Page 55: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Non-linear classification example: XOR/XNOR• 𝑥1, 𝑥2 are binary (0 or 1)

• 𝑦 = 𝑋𝑂𝑅(𝑥1, 𝑥2)

𝑥1

𝑥2

𝑦 = 1

𝑦 = 0𝑦 = 1

𝑦 = 0

𝑥1

𝑥2

Slide credit: Andrew Ng

Page 56: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Simple example: AND

• 𝑥1, 𝑥2 ∈ 0, 1

• 𝑦 = 𝑥1 AND 𝑥2

+1

𝑥1

𝑥2

ℎΘ 𝑥

−30

+20

+20

ℎΘ 𝑥 = 𝑔(−30 + 20𝑥1 + 20𝑥2)

𝑥1 𝑥2 ℎΘ 𝑥

0 0 𝑔 −30 ≈ 0

0 1 𝑔(−10) ≈ 0

1 0 𝑔(−10) ≈ 0

1 1 𝑔(10) ≈ 1

ℎΘ 𝑥 ≈ 𝑥1 AND x2Slide credit: Andrew Ng

Page 57: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Simple example: OR

• 𝑥1, 𝑥2 ∈ 0, 1

• 𝑦 = 𝑥1 AND 𝑥2

+1

𝑥1

𝑥2

ℎΘ 𝑥

−10

+20

+20

ℎΘ 𝑥 = 𝑔(−10 + 20𝑥1 + 20𝑥2)

𝑥1 𝑥2 ℎΘ 𝑥

0 0 𝑔 −10 ≈ 0

0 1 𝑔(10) ≈ 1

1 0 𝑔(10) ≈ 1

1 1 𝑔(30) ≈ 1

ℎΘ 𝑥 ≈ 𝑥1 OR x2Slide credit: Andrew Ng

Page 58: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Simple example: NOT

• 𝑥1, 𝑥2 ∈ 0, 1

• 𝑦 = 𝑥1 AND 𝑥2

+1

𝑥1 ℎΘ 𝑥

10

−20

ℎΘ 𝑥 = 𝑔(10 − 20𝑥1)

𝑥1 ℎΘ 𝑥

0 𝑔 10 ≈ 1

1 𝑔(−10) ≈ 0

ℎΘ 𝑥 ≈ NOT 𝑥1Slide credit: Andrew Ng

Page 59: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Putting it together: 𝑥1 XNOR 𝑥2+1

𝑥1

𝑥2

ℎΘ 𝑥

−30

+20

+20

+1

𝑥1

𝑥2

ℎΘ 𝑥

−10

+20

+20

ℎΘ 𝑥

+1

𝑥1

𝑥2

10

-20

-20

𝑥1 AND 𝑥2 (NOT 𝑥1) AND (NOT 𝑥2) 𝑥1 OR 𝑥2

+1

𝑥1

𝑥2

𝑎1(2)

−30

+20+20

𝑎2(2)

10-20

-20

+1

−10

+20

+20

𝑥1 𝑥2 𝑎1(2)

𝑎2(2) ℎΘ 𝑥

0 0 0 1 1

0 1 0 0 0

1 0 0 0 0

1 1 1 0 1 Slide credit: Andrew Ng

Page 60: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Layer 1

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Page 61: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Layer 2

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Page 62: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Layer 3

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Page 63: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Layer 4 and 5

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Page 64: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Neural Networks

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

Page 65: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Multiple output units: One-vs-all

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑥1

𝑥2

𝑎1(3)

𝑎2(3)

Pedestrian?

Car?

Motocycle?

Truck?

Page 66: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Multiple output units: One-vs-all

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑥1

𝑥2

𝑎1(3)

𝑎2(3)

ℎΘ 𝑥 ∈ 𝑅4

Training set : 𝑥 1 , 𝑦(1) , 𝑥 2 , 𝑦(2) , ⋯ 𝑥(𝑚), 𝑦(𝑚) ,

𝑦(𝑖) one of

1000

,

0100

,

0010

,

0001

Page 67: Neural Networks I - Virginia Techjbhuang/teaching/ECE5424-CS...Administrative •HW 2 released! •Jia-Bin out of town next week •Chen: Deep Neural Networks II •Shih-Yang: Diagnosing

Things to remember

•Why neural networks?

•Model representation

• Examples and intuitions

•Multi-class classification

𝑎1(2)

𝑎2(2)

𝑎3(2)

𝑎0(2)

ℎΘ 𝑥

Layer 1

“Output”

𝑥1

𝑥2

𝑥3

𝑥0

Layer 2 Layer 3

+1

𝑥1

𝑥2

𝑎1(2)−30

+20+20

𝑎2(2)

10-20-20

+1

−10

+20

+20