Teaching Simulation -Based Inference

Dr. Kari Lock MorganDepartment of StatisticsPenn State University

Teaching Simulation-Based Inference

Washington Statistical Society4/1/16

Central Limit Theorem� Xj independent random variables with

mean μj and variance σj2.

� Liapunov’s Central Limit Theorem:

If ∃𝛿 > 0 such that ∑ & |()|*+,

∑ -)*.

)/0

*+,*

1234 → 0,

then ∑ ()67).)/0

∑ -)*.

)/0

8→N(0,1).

APRILFOOL’S!

Use data from a sample survey to estimate a population mean or proportion; develop a margin of

error through the use of simulation methods for random sampling

Use data from a randomized experiment to compare two

treatments; use simulation to decide if differences between

parameters are significant

Question #1

Whatistheaveragemercuryleveloffish(LargeMouthBass)inFloridalakes?

Mercury Levels in Fish

!!n=53!!x =0.527!ppm

Lange, T.,Royals,H.andConnor,L.(2004). Mercuryaccumulation inlargemouthbass(Micropterus salmoides)inaFloridaLake.ArchivesofEnvironmentalContaminationandToxicology,27(4), 466-471.

!µ = ???

Mercury in Fish� The sample mean is 0.527 ppm

� In the US, the FDA action level is 1 ppm¡ Is this safely below the US limit?

� In Canada, the safety limit is 0.5 ppm¡ Is this clearly above the Canadian limit?

� We need a margin of error…

Getting a Margin of Error

Population Sample

Sample

Sample

SampleSampleSample

...

Calculatestatisticforeachsample

SamplingDistribution

StandardError(SE):standarddeviationofsamplingdistribution

MarginofError(ME)(95%CI:ME=2×SE)

statistic±ME

• Key idea: how much do statistics vary from sample to sample?• Wouldn’t it be nice if we could take lots of samples and directly see the variability?!?• Problem?

• We can’t take lots of samples from the population!

Assessing Uncertainty

Getting a Margin of ErrorPopulation

(???)

statistic±ME

Sample

BestGuessatPopulation

Sample

Sample

Sample

SampleSampleSample

...

Distributionofthestatistic

Calculatestatisticforeachsample

StandardError(SE):standarddeviationofthestatistic

MarginofError(ME)(95%CI:ME=2×SE)

Simulating Samples� What is our best guess at the population,

given sample data?¡The sample itself!

� Draw samples repeatedly from the sample data (of size n = 53)…¡… with replacement!

� This is known as bootstrapping¡Simulate many bootstrap samples¡Calculate statistic for each¡Find SE as standard deviation of these statistics

1. Bootstrap Sample: Sample with replacement from the original sample, using the same sample size.

Original Sample Bootstrap Sample

Remember: sample size matters!

2. Calculate statistic 3. Repeat…

We need technology!

StatKeywww.lock5stat.com/statkey

¡ Free¡ Easy to use¡ Online (or offline as chrome app)

SE=0.047

statistic±2xSE0.527± 2x0.047(0.433,0.621)

MercuryLevelsinFish

Weare95%confidentthataveragemercurylevelinfishinFloridalakesisbetween0.433and0.621ppm.

Same process for every parameter!Estimatethemarginoferrorand/oraconfidenceintervalfor...• proportion(𝑝)• differenceinmeans(µ1 − µ2)• differenceinproportions(𝑝1 − 𝑝2)• standarddeviation(𝜎)• correlation(𝜌)• ... Generatesampleswithreplacement

CalculatesamplestatisticRepeat...

MercuryandpHinLakes

Lange,Royals,andConnor,TransactionsoftheAmericanFisheriesSociety(1993)

• ForFloridalakes,whatisthecorrelationbetweenaveragemercurylevel(ppm)infishtakenfromalakeandacidity(pH)ofthelake?

Givea95%CIforρ

r =-0.575

Mercury and pH in Lakes

r =-0.575-0.575± 2× 0.085(-0.745,-0.405)

Weare95%confidentthatthetruecorrelationbetweenmercuryandpHlevelinFloridalakesisbetween-0.745and-0.405.

Bootstrap Interval: Summary� Sample with replacement from data

� Calculate statistic

� Repeat many times

� SE = standard deviation of statistics

� Margin of error = 2 x SE

� Interval: statistic ± 2 x SE






✔

Question #2

Doesdrinkingteaboostyourimmunesystem?

Tea and Immune Response

Antigensintea-BeveragePrimeHumanVγ2Vδ2TCellsinvitroandinvivoforMemoryandNon-memoryAntibacterialCytokineResponses,Kamath et.al.,ProceedingsoftheNationalAcademyofSciences,May13,2003.

• Participants were randomized to drink five or six cups of either tea (black) or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine)

• After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured

• Explanatory variable: tea or coffee • Response variable: measure of interferon gamma

Tea and Immune System

�̅�G − �̅�H = 34.82 − 17.70 = 17.12

1. Checkconditions

2. Computestatistic:chooseformula,plugandchug

3. Usetheoreticaldistribution(whichone?df?)

4. 0.025<p-value<0.05

Getting the p-value: Option 1

𝑡 =𝑥4 − 𝑥P

𝑠4P𝑛4+ 𝑠PP𝑛P

= 2.07

𝑛4 = 11𝑛P = 10

p-value: The chance of obtaining a statistic as extreme as that observed, just by random chance, if the null hypothesis is true

Say What???We need a better way…

Actual ExperimentR R R R R

R R R R R

R R R R RR R R R R

Tea Coffee

R R RR R R R R

R R R

R R R R R

R

R

Actual Experiment

R R RR R R R R

R R R

R R R R R

R R R R13 18 205 1152 55 5647 48 R58

3 11 150 0

21 38 5216 21

Tea Coffee

Actual Experiment

R R RR R R R R

R R R

R R R R R

R R R R13 18 205 1152 55 5647 48 R58

3 11 150 0

21 38 5216 21

Tea Coffee

!!xT − xC =17.12

� Twoplausibleexplanations:¡Teaboostsimmunity

¡Randomchance Whatmighthappenjustbyrandomchance???

R R RR R R R R

R R R

R R R R R

R R R R13 18 205 1152 55 5647 48 R58

3 11 150 0

21 38 5216 21

Tea Coffee

R R RR R R R RR R 13 18 205 11

52 55 5647 48 R58

R R R

R R R R R

R R 3 11 150 0

21 38 5216 21

Simulation

Simulation

R R RR R R R R

R R R

R R R R R15 16 21

18 20 4721 13 R55

38 52 5

52 56 5811 48

Tea Coffee

R R RR R R R RR R 13 18 205 11

52 55 5647 48 R58

R

R R R R R

3 11 150 0

21 38 5216 21

Simulation

R R RR R R R R

R R R

R R R R R15 16 21

18 20 4721 13 R55

38 52 5

52 56 5811 48

Tea Coffee

0 3 R R0 11

!!xT − xC = −12.3

RepeatManyTimes!

Data

p-valueProportionasextremeasobservedstatistic

observedstatistic

RandomizationTest

DistributionofstatisticifH0 true

If there were no difference between tea and coffee regarding immune system response, we would see results this extreme about 2.6% of the time

p-value: The chance of obtaining a statistic as extreme as that observed, just by random chance, if the null hypothesis is true

• Simulate as if the null hypothesis were true

• Calculate the statistic of interest

• Repeat

• p-value = proportion of these simulated statistics that are as extreme as your observed statistic

• Same procedure for every parameter!

Randomization Test

Question #3

Doeshormonereplacementtherapycauseincreasedrisk

ofbreastcancer?

Hormone Replacement Therapy� Until 2002, hormone replacement therapy (HRT) was

commonly prescribed to post-menopausal women. This changed in 2002, when the results of a large clinical trial were published

� 8506 women were randomized to take HRT, 8102 were randomized to placebo. 166 HRT and 124 placebo women developed invasive breast cancer

� Does hormone replacement therapy cause increased risk of breast cancer?

� How unlikely would this be, just by random chance, if there were no difference between HRT and placebo regarding invasive breast cancer?

HRT and Invasive Breast Cancer

� If there were no difference between HRT and placebo regarding invasive breast cancer, we would only see results this extreme 2 out of 100 times.

� We have evidence that HRT increases risk of invasive breast cancer.

Hormone Replacement Therapy� Same trial, different variable of interest.� 8506 women were randomized to take HRT,

8102 were randomized to placebo. 502 HRT and 458 placebo women developed any kind of cancer.

� Does hormone replacement therapy cause increased risk of cancer in general?

� How unlikely would this be, just by random chance, if there were no difference between HRT and placebo regarding cancer?

HRT and All Cancer

� If there were no difference between HRT and placebo regarding cancer, we would see results this extreme about 24 out of 100 times, or about a quarter of the time.

� We do not have evidence that HRT increases risk of cancer in general.






✔

✔

Malevolent Uniforms

SampleCorrelation=0.43

Doteamswithmoremalevolentuniformscommit/getmorepenalties,oristherelationshipjustduetorandomchance?

Good Technology Options� StatKey: lock5stat.com/statkey� Rossman/Chance: rossmanchance.com/applets� InZight: stat.auckland.ac.nz/~wild/iNZight� R: cran.r-project.org� RStudio: rstudio.com� Fathom: fathom.concord.org� Tinkerplots: tinkerplots.com� JMP: jmp.com� StatCrunch: statcrunch.com

Red = Free

Simulation Methods• Intrinsically connected to concepts

• Minimal background knowledge needed

• More visual, less mathematical

• Same procedure applies to all statistics

• More generalizable

• Fewer conditions; conditions transparent

Fall '14 Spring '15 Fall '15

2040

6080

100

Pos

t Tes

t Sco

re

Scores on a National Assessment

Averages:

p-value:0.00002

43% 60% 63%National:47%

National Assessment ResultsMost improvement on p-value questions!

T-test Simulation National

Abletoreasonthatasmallerp-valueprovidesstrongerevidenceagainstthenullhypothesisthanalargerp-value.

29%

Abletoreasonaboutaconclusionbasedonastatisticallysignificantp-valueinthecontextofaresearchstudythatcomparestwogroups.

43%

45%

48%

68%

80%

Student Behavior• Studentsweregivendataonthesecondmidtermandaskedtocomputeaconfidenceintervalforthemean

• Howtheycreatedtheinterval:

Bootstrapping t.test inR Formula

94 9 984% 8% 8%

Itisthewayofthepast…

"Actually,thestatisticiandoesnotcarryoutthisverysimpleandverytediousprocess[therandomizationtest],buthisconclusionshavenojustificationbeyondthefactthattheyagreewiththosewhichcouldhavebeenarrivedatbythiselementarymethod."

-- SirR.A.Fisher,1936

…andthewayofthefuture“...theconsensuscurriculumisstillanunwittingprisonerofhistory.Whatweteachislargelythetechnicalmachineryofnumericalapproximationsbasedonthenormaldistributionanditsmanysubsidiarycogs.Thismachinerywasoncenecessary,becausetheconceptuallysimpleralternativebasedonpermutationswascomputationallybeyondourreach.Beforecomputersstatisticianshadnochoice.Thesedayswehavenoexcuse.Randomization-basedinferencemakesadirectconnectionbetweendataproductionandthelogicofinferencethatdeservestobeatthecoreofeveryintroductorycourse.”

-- ProfessorGeorgeCobb,2007

Want More?� Sessions, panel, and discussion on teaching

with simulation-based inference during the Electronic Conference on Teaching Statistics (eCOTS), May 16 – 20th, 2016: causeweb.org/cause/ecots/ecots16

� Contact me ([email protected]) if interested in a potential NSF-sponsored experience for high school teachers

� Simulation-based inference blog: causeweb.org/sbi

[email protected]

Teaching Simulation -Based Inference

Documents

Transcript of Teaching Simulation -Based Inference