Essential Intuitive Statistics for Experimentation

Post on 13-Apr-2017

107 views 0 download

Transcript of Essential Intuitive Statistics for Experimentation

Intuitive Statistics

Matt Gardner

Population

SampleConfidence Interval

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Population

SampleConfidence Interval

( μ )

( x, se )( x ± z x se )

False Positives

Let’s pretend μT-μC=0

False Negatives

Let’s pretend μT-μC=d

Metrics

Unit of Analysis Measure

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

Metrics

Unit of Analysis Measure

Inputs for calculation

• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate

* for proportion metrics sd = p.(1-p).n

Mean Proportion

Traffic percent

Sample size

Lift, alpha, beta

SQL

Measure

Unit of Analysis

Simple Sample Size Calculator – here.

x duration

Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric

• Experiment results are subject to randomness and conclusions will sometimes be in error • We choose the false positive and false negative error rates at experiment design time

• We know in advance if the experiment is likely to be useful and we should think carefully before running experiments … they are expensive!

• Always compute sample size!