An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An...

34
1/32 An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching Machines, Robots, and Humans

Transcript of An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An...

Page 1: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

1/32

An Overview of Machine Teaching

Adish Singla, Jerry Zhu

NIPS 2017 Workshop onTeaching Machines, Robots, and Humans

Page 2: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

2/32

A prototypical machine teaching task

Page 3: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

3/32

Compare passive learning, active learning, teaching

Example: learn a 1D threshold classifier

Page 4: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

4/32

Passive learning

x1, . . . , xn ∼ U [0, 1]

yi = θ∗(xi)

With large probability

|θ̂ − θ∗| = O(n−1) ≤ ε

n ≥ O(ε−1)

Page 5: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

5/32

Active learning

I learner picks query x, oracle answers y = θ∗(x)

I binary search

|θ̂ − θ∗| = O(2−n) ≤ ε

n ≥ O(log(ε−1))

Page 6: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

6/32

Machine teaching

I teacher can design an optimal training set of size 2

n = 2, ∀ε > 0

Page 7: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

7/32

Another example: teach hard margin SVM

Remark: Teaching Dimension TD = 2 but V C = d+ 1

Page 8: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

8/32

Yet another example: teach Gaussian density

TD = d+ 1: tetrahedron vertices

Page 9: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

9/32

Machine learning vs. machine teaching

Machine learning (D given, learn θ)

minθ

∑(x,y)∈D

`(x, y, θ) + λ‖θ‖2

Machine teaching (θ∗ given, learn D)

minD,θ̂

‖θ̂ − θ∗‖2 + η‖D‖0

s.t. θ̂ = argminθ

∑(x,y)∈D

`(x, y, θ) + λ‖θ‖2

D usually not i.i.d.

Page 10: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

10/32

Why bother if we already know θ∗?

I education

I adversarial attacks

I . . .

Page 11: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

11/32

Machine teaching generic form

minD,θ̂

TeachingRisk(θ̂) + ηTeachingCost(D)

s.t. θ̂ = MachineLearning(D)

I exact vs. approximate teaching

I parameter vs. generalization error

I cost not always number of items

I any of the constraint forms

Page 12: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

12/32

The coding view

message=θ∗, decoder=learning algorithm A, language=D

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Page 13: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

13/32

In other words

teach·ing/’teCHiNG/noun

1. controlling

2. shaping

3. persuasion

4. influence maximization

5. attacking

6. poisoning

Page 14: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

14/32

Characterizing the space of teaching tasks

Page 15: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

15/32

The friend vs. foe dimension

friend

I education

I improving cognitive models

I fast classifier training

I debugging machine learners

I . . .

I training-set poisoning attacks (not test-time adversarialattacks)

foe

Page 16: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

16/32

The human vs. machine dimension

I machine teacher, machine learner: attacks

I machine teacher, human learner: education

I human teacher, machine learner: fast classifier training

I human teacher, human learner: (not this workshop)

Page 17: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

17/32

The batch vs. sequential dimension

Batch teaching

I batch learners

Sequential teaching

I stochastic gradient descent learner

I multiarmed bandits

I reinforcement learning (e.g. teaching by demonstration)

Page 18: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

18/32

The learner anticipation dimension

I does not anticipate teaching (standard learners)I version space learner {θ agrees with D}I Bayesian learner p(θ | D)I convex regularized empirical risk minimizer

minθ

∑(x,y)∈D

`(x, y, θ) + λ‖θ‖2

I deep neural networkI cognitive model

I anticipates teachingI especially when human is involved

Page 19: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

19/32

The learner anticipation dimension (cont.)

A smart learner can

I educate the (suboptimal human) teacher in the structure ofthe optimal teaching set

I detect suboptimal teaching and switch to active learning

I translate the teaching set aimed at a different learner

Recursive teaching dimension RTD (vs. classical TD)

Page 20: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

20/32

The learner transparency dimension

clearbox

I bilevel optimization

minD,θ̂

TeachingRisk(θ̂) + ηTeachingCost(D)

s.t. θ̂ = MachineLearning(D)

I . . .

I some learning hyper-parameters unknown. Probe and teach

I . . .

I evaluation function on D. Bayesian optimization.

blackbox

Page 21: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

21/32

The number of learners dimension

I one lecture, one student (standard)I one lecture, many students

I worst-case guarantee: minimax riskI average-case guarantee: Bayes risk

Page 22: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

22/32

The “what’s in your training set” dimension

Items in D can be:

I pool-based teaching: subset (multiset) of a given pool{(xi, yi)}1:N

I synthetic / constructive teaching:I honest: x ∈ X , y = θ∗(x)I liar: (x, y) ∈ X × Y; fake rewards. Ethics questions

I alter existing training set D0 + (δx, δy)

I features, pairwise comparisons (requires corresponding learner)

Page 23: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

23/32

The theory vs. empirical dimension

I teaching dimension TD, recursive teaching dimension RTD,preference-based TD, etc.

TD(θ∗) ≡ minD

‖D‖0

s.t. {θ∗} = VersionSpace(D).

I . . .

I improve student test scores

Page 24: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

24/32

A few open problems

Page 25: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

25/32

Information complexity of teaching and learning

I relationship between recursive teaching dimension (RTD) andVC-dimension (V CD)

I RTD is upper bounded by O(V CD)?

I connections between above question and the long-standingopen “sample compression conjecture” [Littlestone andWarmuth, 1986]

Page 26: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

26/32

New notions of teaching dimension (TD)

I complexity of teaching with new signalsI e.g. features, pairwise comparison queries

I complexity of teaching with additional constraintsI e.g. interpretable signals

Page 27: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

27/32

Solving the combinatorial, bilevel optimization problem

I bilevel optimization

minD,θ̂

TeachingRisk(θ̂) + ηTeachingCost(D)

s.t. θ̂ = MachineLearning(D)

I mixed integer nonlinear programmingI even simple instances are NP-Hard (e.g. set-cover, subset sum)

I requires new approximation algorithmsI e.g. via characterizing submodularity properties of the problem

Page 28: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

28/32

Machine teaching for reinforcement learning

I optimal demonstrations for inverse reinforcement learning

I teaching humans how to teach robots

Page 29: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

29/32

The need for a good cognitive model

I model-based vs. model-free approachesI 3 workshop papers on spaced repetition technique

I can machine teaching guide the search for better models?

Page 30: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

30/32

Novel applications and industry insights

I machine teaching for debugging machine learning?

I program synthesis, social robotics, etc.

Page 31: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

31/32

Workshop preview

Page 32: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

32/32

Cluster 1: Teacher who optimizes training data

Posters:

3 Optimizing Human Learning

5 Interpretable Machine Teaching via Feature Feedback

7 Interpretable and Pedagogical Examples

11 Accelerating Human Learning with Deep ReinforcementLearning

12 Program2Tutor: Combining Automatic Curriculum Generationwith Multi-Armed Bandits for Intelligent Tutoring Systems

15 Predicting Recall Probability to Adaptively Prioritize Study

Talks:

I Emma Brunskill

I Burr Settles

I Le Song

Page 33: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

33/32

Cluster 2: Student who appreciates teaching

Posters:

2 Machine Education - The Way Forward for AchievingTrust-Enabled Machine Agents

4 Model Distillation with Knowledge Transfer from FaceClassification to Alignment and Verification

6 Pedagogical Learning

13 Generative Knowledge Distillation for General PurposeFunction Compression

14 Explainable Artificial Intelligence via Bayesian Teaching

Page 34: An Overview of Machine Teachingteaching-machines.cc/nips2017/tutorial/NIPS17WStutorial.pdf · An Overview of Machine Teaching Adish Singla, Jerry Zhu NIPS 2017 Workshop on Teaching

34/32

Cluster 3: Related Topics

Posters:

1 Generative Adversarial Active Learning

8 Faster Reinforcement Learning Using Active Simulators

9 Gradual Tuning: A Better Way of Fine Tuning the Parametersof a Deep Neural Network

10 Machine Teaching: A New Paradigm for Building MachineLearning Systems

Talks:

I Shay Moran

I Patrice Simard