of 50

• date post

01-Oct-2020
• Category

## Documents

• view

1

0

Embed Size (px)

### Transcript of learning zoo A field guide to the machine - FOSDEM A field guide to the machine learning zoo...

• A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH

• From idea to objective function

• Formulating an ML problem

• Formulating an ML problem

● Common aspects

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ)

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ) ○ Data (D)

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ) ○ Data (D)

● Objective function: L(θ, D)

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ) ○ Data (D)

● Objective function: L(θ, D) ● Prior knowledge: r(θ)

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ) ○ Data (D)

● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ)

Source: Xing (2015)

• Formulating an ML problem

● Common aspects ○ Model (θ) ○ Data (D)

● Objective function: L(θ, D) ● Prior knowledge: r(θ) ● ML program: f(θ, D) = L(θ, D) + r(θ) ● ML Algorithm: How to optimize f(θ, D)

Source: Xing (2015)

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

● Data (D): ● Model (θ): ● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

● Data (D): Features and labels, xi, yi ● Model (θ): ● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

● Data (D): Features and labels, xi, yi ● Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

● Objective function - L(D, θ): ● Prior knowledge (Regularization): ● Algorithm:

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

● Data (D): Features and labels, xi, yi ● Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤxi)) ● Prior knowledge (Regularization): r(w) = λ*wΤw ● Algorithm:

Warning: Notation abuse

• Example: Improve retention at Twitter

● Goal: Reduce the churn of users on Twitter ● Assumption: Users churn because they don’t engage with the platform ● Idea: Increase the retweets, by promoting tweets more likely to be

retweeted

● Data (D): Features and labels, xi, yi ● Model (θ): Logistic regression, parameters w

○ p(y|x, w) = Bernouli(y | sigm(wΤx))

● Objective function - L(D, θ): NLL(w) = Σ log(1 + exp(-y wΤxi)) ● Prior knowledge (Regularization): r(w) = λ*wΤw ● Algorithm: Gradient Descent

• Data problems

• Data problems

● GIGO: Garbage In - Garbage Out

Source: Lawrence (2017)

● Problem: “Data” as a concept is hard to reason about. ● Goal: Make the stakeholders aware of the state of the data at all stages

Source: Lawrence (2017)

Source: Lawrence (2017)

● Band C ○ Accessibility

Source: Lawrence (2017)

● Band C ○ Accessibility

● Band B ○ Representation and faithfulness

Source: Lawrence (2017)

● Band C ○ Accessibility

● Band B ○ Representation and faithfulness

● Band A ○ Data in context

Source: Lawrence (2017)

● Band C ○ “How long will it take to bring our user data to C1 level?”

● Band B ○ “Until we know the collection process we can’t move the data to B1.”

● Band A ○ “We realized that we would need location data in order to have an A1 dataset.”

Source: Lawrence (2017)

● Band C ○ “How long will it take to bring our user data to C1 level?”

● Band B ○ “Until we know the collection process we can’t move the data to B1.”

● Band A ○ “We realized that we would need location data in order to have an A1 dataset.”

• Selecting algorithm & software: “Easy” choices

• Selecting algorithms

• An ML algorithm “farm”

Source: scikit-learn.org

• The neural network zoo

Source: Asimov Institute (2016)

• Selecting algorithms

● Always go for the simplest model you can afford

• Selecting algorithms

● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right

Source: Zinkevich (2017)

• Selecting algorithms

● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug.

Source: Zinkevich (2017)

• Selecting algorithms

● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries

Source: Sculley et al. (2015)

• Selecting algorithms

● Always go for the simplest model you can afford ○ Your first model is more about getting the infrastructure right ○ Simple models are usually interpretable. Interpretable models are easier to debug. ○ Complex model erode boundaries

■ CACE principle: Changing Anything Changes Everything

Source: Sculley et al. (2015)

• Selecting software

• The ML software zoo

Leaf

• Your model vs. the world

• What are the problems with ML systems?

Data ML Code Model

Expectation

• What are the problems with ML systems?

Data ML Code Model

Reality

Sculley et al. (2015)

• Things to watch out for

• ● Data dependencies

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

• ● Data dependencies ○ Unstable dependencies

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

• ● Data dependencies ○ Unstable dependencies

● Feedback loops

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

• ● Data dependencies ○ Unstable dependencies

● Feedback loops ○ Direct

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

• ● Data dependencies ○ Unstable dependencies

● Feedback loops ○ Direct ○ Indirect

Things to watch out for

Sculley et al. (2015) & Zinkevich (2017)

• Bringing it all together

• Bringing it all together

● Define your problem as optimizing your objective function using data ● Determine (and monitor) the readiness of your data ● Don't spend too much time at first choosing an ML framework/algorithm ● Worry much more about what happens when your model meets the world.

• Thank you.

@thvasilo tvas@sics.se