NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Transcript

Machine Teachingfor

Personalized Education, Security, Interactive Machine Learning

Jerry Zhu

NIPS 2015 Workshop on Machine Learning from and forAdaptive User Technologies

Supervised Learning Review

I D: training set (x1, y1) . . . (xn, yn)

I Learning algorithm A : A(D) = θI Example: regularized empirical risk minimization

A(D) = argminθ

n∑i=1

`(θ, xi, yi) + λΩ(θ)

I Example: version space learner

A(D) = θ consistent with D

Page 3: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Machine Teaching = Inverse Machine Learning

I Given: learning algorithm A, target model θ∗

I Find: the smallest D such that A(D) = θ∗

A(D)

θ∗

Α (θ )−1 ∗

Α−1

Page 4: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Machine Teaching Example

I Teach a 1D threshold classifier

I Given: A = SVM, θ∗ = 0.6

I What is the smallest D?

Page 5: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Machine Teaching Example

I Teach a 1D threshold classifier

I Given: A = SVM, θ∗ = 0.6

I What is the smallest D?

Page 6: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Machine Teaching as Communication

I sender = teacher

I message = θ∗

I decoder = A

I codeword = D

Page 7: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Machine Teaching “Stronger” Than Active Learning

Sample complexity to achieve ε error:

I passive learning n = O (1/ε))

I active learning n = O (log(1/ε)): needs binary search

I machine teaching n = 2: teaching dimension [Goldman + Kearns

1995], the teacher knows θ∗

θ θ

O(1/2 )n

O(1/n)

passive learning "waits" active learning "explores" teaching "guides"

Page 8: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Why Machine Teaching?

Why bother if we already know θ∗?

1. education

2. computer security

3. interactive machine learning

Education

Teacher answers three questions:

1. My student’s cognitive model A is a(a) SVM (b) logistic regression (c) neural net . . .

2. Student success is defined by(a) learning a target model θ∗ (b) excel on a test set

3. My teaching effort is defined by(a) training set size (b) training item cost . . .

Machine teaching finds the optimal, personalized lesson D for thestudent.

Page 10: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Education Example[Patil et al. 2014]

I Human categorization task: 1D threshold θ∗ = 0.5

I A: kernel density estimator

I Optimal D:

θ =0.5∗0 1

y=−1 y=1

I Teaching humans with D:

human trained on human test accuracy

optimal D 72.5%random items 69.8%

(statistically significant)

Page 11: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Security: Data Poisoning Attack[Alfeld et al. 2016], [Mei + Zhu 2015]

I Given: learner A, attack target θ∗, clean training data D0

I Find: the minimum “poison” δ such that A(D0 + δ) = θ∗

Page 12: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Security: Lake Mendota Data

1900 1920 1940 1960 1980 200040

100

120

140

YEAR

ICE

Lake Mendota, Wisconsin ice days

Page 13: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Security: Optimal Attacks on Ordinary Least Squares

minδ,β

‖δ‖p

s.t. β1 ≥ 0

β = argminβ‖(y + δ)−Xβ‖2

1900 1920 1940 1960 1980 200040

100

120

140

YEAR

ICE

original, β1=−0.1

2−norm attack, β1=0

1900 1920 1940 1960 1980 20000

100

150

200

250

300

YEAR

ICE

original, β1=−0.1

1−norm attack, β1=0

minimize ‖δ‖22 minimize ‖δ‖1

Page 14: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Security: Optimal Attack on Latent Dirichlet Allocation[Mei + Zhu 2015]

Page 15: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Optimization for Machine Teaching

Bilevel optimization

minD∈D

|D|

s.t. θ∗ = argminθ∈Θ

|D|∑

(xi,yi)∈D

`(xi, yi, θ) + Ω(θ)

In general

minD∈D

TeachingLoss(A(D), θ∗) + TeachingEffort(D)

I Sometimes closed-form solution [Alfeld et al. 2016] [Zhu 2013]

I Often nonconvex optimization [Bo Zhang, this workshop]

Page 16: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Theoretical Question: Teaching Dimension

I Given: A, θ∗

I Find: the smallest D such that A(D) = θ∗

Teaching dimension TD = |D|

I A = version space learner: TD =∞? [Goldman + Kearns 1995]

I A = SVM : TD = 2

Learner-specific teaching dimension

Page 17: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Theoretical Question: Teaching Dimension

I Given: A, θ∗

I Find: the smallest D such that A(D) = θ∗

Teaching dimension TD = |D|I A = version space learner: TD =∞? [Goldman + Kearns 1995]

I A = SVM : TD = 2

Learner-specific teaching dimension

Page 18: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Teaching Dimension Example: Teaching an SVM

I LearnerA(D) = arg minθ∈R2

∑ni=1 max(1− yix>i θ, 0) + 1

2‖θ‖2

I Target model: θ∗ = (12 ,√

32 )>

I One training item is necessary and sufficient:

x1 = (1

√3

2)>, y1 = 1

I TD(θ∗, A) = 1

Page 19: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

The Teaching Dimension of Linear Learners[Liu & Zhu 2015]

A(D) = argminθ∈Rd

n∑i=1

`(θ>xi, yi) +λ

2‖θ‖22

goal ↓ homogeneous inhomogeneous θ∗ = [w∗; b∗]ridge SVM logistic ridge SVM logistic

parameter 1⌈λ‖θ∗‖2

⌉ ⌈λ‖θ∗‖2τmax

⌉2 2

⌈λ‖w∗‖2

⌉†2⌈λ‖w∗‖22τmax

⌉†boundary - 1 1 - 2 2

Teaching Dimension (independent of d) distinct fromVC-dimension (d+ 1)

Page 20: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

The Teaching Dimension of Linear Learners[Liu & Zhu 2015]

A(D) = argminθ∈Rd

n∑i=1

`(θ>xi, yi) +λ

2‖θ‖22

goal ↓ homogeneous inhomogeneous θ∗ = [w∗; b∗]ridge SVM logistic ridge SVM logistic

parameter 1⌈λ‖θ∗‖2

⌉ ⌈λ‖θ∗‖2τmax

⌉2 2

⌈λ‖w∗‖2

⌉†2⌈λ‖w∗‖22τmax

⌉†boundary - 1 1 - 2 2

Teaching Dimension (independent of d) distinct fromVC-dimension (d+ 1)

Page 21: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Interactive Machine Learning

I You train a classifier

I Only use item, label pairs (x1, y1) . . . (xn, yn)

I Cost = n to reach small ε error

Page 22: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

“Speed of Light”[Goldman+Kearns 1995], [Angluin 2004], [Cakmak+Thomaz 2011], [Zhu et al. in preparation]

Fundamental Cost of Interactive Machine Learningn ≥ Teaching Dimension

I Optimal teacher achieves n =Teaching Dimension

I Can be much faster than active learning (recall 2 vs. log 1ε )

I Must allow teacher-initiated items (unlike active learning)

I But some humans are bad teachers . . .

Page 23: Machine Teaching for Personalized Education, Security ...pages.cs.wisc.edu/~jerryzhu/pub/NIPS15WS.pdf · Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS

Some Humans are Suboptimal Teachers[Khan, Zhu, Mutlu 2011]

Challenge for the computer: make human behave more like theoptimal teacher

Summary

Machine teaching

I Theory: teaching dimension

I Algorithm: bilevel optimization

I Applications: education, security, interactive machine learning

http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Top Related

Rice Cake Machine

Security is a mindset - Infocom Security · Security is a mindset ... Adaptive Security Architecture. HardenSystems. IsolateSystems. PreventAttacks. Detect Incidents. Confirm and

Beirut Security Map

Modeling the personalized variations in liver disease due to ±1-antitrypsin deficiency using

DOUBLE SECURITY - Commercial Dado

MikroTik RB750 - Basic Firewall & Security

Cyber Security Bulletin SB06-296 - Αρχική Security/Vulnerabilities... · Cyber Security Bulletin SB06-296 ... open sour ce reports and is not a direct result of US-CERT analysis.

Turing machine