Download - Essential Intuitive Statistics for Experimentation

Transcript
Page 1: Essential Intuitive Statistics for Experimentation

Intuitive Statistics

Matt Gardner

Page 2: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 3: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 4: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 5: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 6: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 7: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 8: Essential Intuitive Statistics for Experimentation

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Page 9: Essential Intuitive Statistics for Experimentation

Metrics

Unit of Analysis Measure

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Page 10: Essential Intuitive Statistics for Experimentation

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Page 11: Essential Intuitive Statistics for Experimentation

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Page 12: Essential Intuitive Statistics for Experimentation

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Page 13: Essential Intuitive Statistics for Experimentation

• Experiment results are subject to randomness and conclusions will sometimes be in error • We choose the false positive and false negative error rates at experiment design time

• We know in advance if the experiment is likely to be useful and we should think carefully before running experiments … they are expensive!

• Always compute sample size!