Teaching Simulation -Based Inference
Transcript of Teaching Simulation -Based Inference
Dr. Kari Lock MorganDepartment of StatisticsPenn State University
Teaching Simulation-Based Inference
Washington Statistical Society4/1/16
Central Limit Theorem� Xj independent random variables with
mean μj and variance σj2.
� Liapunov’s Central Limit Theorem:
If ∃𝛿 > 0 such that ∑ & |()|*+,
∑ -)*.
)/0
*+,*
1234 → 0,
then ∑ ()67).)/0
∑ -)*.
)/0
8→N(0,1).
APRILFOOL’S!
Use data from a sample survey to estimate a population mean or proportion; develop a margin of
error through the use of simulation methods for random sampling
Use data from a randomized experiment to compare two
treatments; use simulation to decide if differences between
parameters are significant
Question #1
Whatistheaveragemercuryleveloffish(LargeMouthBass)inFloridalakes?
Mercury Levels in Fish
!!n=53!!x =0.527!ppm
Lange, T.,Royals,H.andConnor,L.(2004). Mercuryaccumulation inlargemouthbass(Micropterus salmoides)inaFloridaLake.ArchivesofEnvironmentalContaminationandToxicology,27(4), 466-471.
!µ = ???
Mercury in Fish� The sample mean is 0.527 ppm
� In the US, the FDA action level is 1 ppm¡ Is this safely below the US limit?
� In Canada, the safety limit is 0.5 ppm¡ Is this clearly above the Canadian limit?
� We need a margin of error…
Getting a Margin of Error
Population Sample
Sample
Sample
SampleSampleSample
...
Calculatestatisticforeachsample
SamplingDistribution
StandardError(SE):standarddeviationofsamplingdistribution
MarginofError(ME)(95%CI:ME=2×SE)
statistic±ME
• Key idea: how much do statistics vary from sample to sample?• Wouldn’t it be nice if we could take lots of samples and directly see the variability?!?• Problem?
• We can’t take lots of samples from the population!
Assessing Uncertainty
Getting a Margin of ErrorPopulation
(???)
statistic±ME
Sample
BestGuessatPopulation
Sample
Sample
Sample
SampleSampleSample
...
Distributionofthestatistic
Calculatestatisticforeachsample
StandardError(SE):standarddeviationofthestatistic
MarginofError(ME)(95%CI:ME=2×SE)
Simulating Samples� What is our best guess at the population,
given sample data?¡The sample itself!
� Draw samples repeatedly from the sample data (of size n = 53)…¡… with replacement!
� This is known as bootstrapping¡Simulate many bootstrap samples¡Calculate statistic for each¡Find SE as standard deviation of these statistics
1. Bootstrap Sample: Sample with replacement from the original sample, using the same sample size.
Original Sample Bootstrap Sample
Remember: sample size matters!
2. Calculate statistic 3. Repeat…
We need technology!
StatKeywww.lock5stat.com/statkey
¡ Free¡ Easy to use¡ Online (or offline as chrome app)
SE=0.047
statistic±2xSE0.527± 2x0.047(0.433,0.621)
MercuryLevelsinFish
Weare95%confidentthataveragemercurylevelinfishinFloridalakesisbetween0.433and0.621ppm.
Same process for every parameter!Estimatethemarginoferrorand/oraconfidenceintervalfor...• proportion(𝑝)• differenceinmeans(µ1 − µ2)• differenceinproportions(𝑝1 − 𝑝2)• standarddeviation(𝜎)• correlation(𝜌)• ... Generatesampleswithreplacement
CalculatesamplestatisticRepeat...
MercuryandpHinLakes
Lange,Royals,andConnor,TransactionsoftheAmericanFisheriesSociety(1993)
• ForFloridalakes,whatisthecorrelationbetweenaveragemercurylevel(ppm)infishtakenfromalakeandacidity(pH)ofthelake?
Givea95%CIforρ
r =-0.575
Mercury and pH in Lakes
r =-0.575-0.575± 2× 0.085(-0.745,-0.405)
Weare95%confidentthatthetruecorrelationbetweenmercuryandpHlevelinFloridalakesisbetween-0.745and-0.405.
Bootstrap Interval: Summary� Sample with replacement from data
� Calculate statistic
� Repeat many times
� SE = standard deviation of statistics
� Margin of error = 2 x SE
� Interval: statistic ± 2 x SE
Use data from a sample survey to estimate a population mean or proportion; develop a margin of
error through the use of simulation methods for random sampling
Use data from a randomized experiment to compare two
treatments; use simulation to decide if differences between
parameters are significant
✔
Question #2
Doesdrinkingteaboostyourimmunesystem?
Tea and Immune Response
Antigensintea-BeveragePrimeHumanVγ2Vδ2TCellsinvitroandinvivoforMemoryandNon-memoryAntibacterialCytokineResponses,Kamath et.al.,ProceedingsoftheNationalAcademyofSciences,May13,2003.
• Participants were randomized to drink five or six cups of either tea (black) or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine)
• After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured
• Explanatory variable: tea or coffee • Response variable: measure of interferon gamma
Tea and Immune System
�̅�G − �̅�H = 34.82 − 17.70 = 17.12
1. Checkconditions
2. Computestatistic:chooseformula,plugandchug
3. Usetheoreticaldistribution(whichone?df?)
4. 0.025<p-value<0.05
Getting the p-value: Option 1
𝑡 =𝑥4 − 𝑥P
𝑠4P𝑛4+ 𝑠PP𝑛P
= 2.07
𝑛4 = 11𝑛P = 10
p-value: The chance of obtaining a statistic as extreme as that observed, just by random chance, if the null hypothesis is true
Say What???We need a better way…
Actual ExperimentR R R R R
R R R R R
R R R R RR R R R R
Tea Coffee
R R RR R R R R
R R R
R R R R R
R
R
Actual Experiment
R R RR R R R R
R R R
R R R R R
R R R R13 18 205 1152 55 5647 48 R58
3 11 150 0
21 38 5216 21
Tea Coffee
Actual Experiment
R R RR R R R R
R R R
R R R R R
R R R R13 18 205 1152 55 5647 48 R58
3 11 150 0
21 38 5216 21
Tea Coffee
!!xT − xC =17.12
� Twoplausibleexplanations:¡Teaboostsimmunity
¡Randomchance Whatmighthappenjustbyrandomchance???
R R RR R R R R
R R R
R R R R R
R R R R13 18 205 1152 55 5647 48 R58
3 11 150 0
21 38 5216 21
Tea Coffee
R R RR R R R RR R 13 18 205 11
52 55 5647 48 R58
R R R
R R R R R
R R 3 11 150 0
21 38 5216 21
Simulation
Simulation
R R RR R R R R
R R R
R R R R R15 16 21
18 20 4721 13 R55
38 52 5
52 56 5811 48
Tea Coffee
R R RR R R R RR R 13 18 205 11
52 55 5647 48 R58
R
R R R R R
3 11 150 0
21 38 5216 21
Simulation
R R RR R R R R
R R R
R R R R R15 16 21
18 20 4721 13 R55
38 52 5
52 56 5811 48
Tea Coffee
0 3 R R0 11
!!xT − xC = −12.3
RepeatManyTimes!
Data
p-valueProportionasextremeasobservedstatistic
observedstatistic
RandomizationTest
DistributionofstatisticifH0 true
If there were no difference between tea and coffee regarding immune system response, we would see results this extreme about 2.6% of the time
p-value: The chance of obtaining a statistic as extreme as that observed, just by random chance, if the null hypothesis is true
• Simulate as if the null hypothesis were true
• Calculate the statistic of interest
• Repeat
• p-value = proportion of these simulated statistics that are as extreme as your observed statistic
• Same procedure for every parameter!
Randomization Test
Question #3
Doeshormonereplacementtherapycauseincreasedrisk
ofbreastcancer?
Hormone Replacement Therapy� Until 2002, hormone replacement therapy (HRT) was
commonly prescribed to post-menopausal women. This changed in 2002, when the results of a large clinical trial were published
� 8506 women were randomized to take HRT, 8102 were randomized to placebo. 166 HRT and 124 placebo women developed invasive breast cancer
� Does hormone replacement therapy cause increased risk of breast cancer?
� How unlikely would this be, just by random chance, if there were no difference between HRT and placebo regarding invasive breast cancer?
HRT and Invasive Breast Cancer
� If there were no difference between HRT and placebo regarding invasive breast cancer, we would only see results this extreme 2 out of 100 times.
� We have evidence that HRT increases risk of invasive breast cancer.
Hormone Replacement Therapy� Same trial, different variable of interest.� 8506 women were randomized to take HRT,
8102 were randomized to placebo. 502 HRT and 458 placebo women developed any kind of cancer.
� Does hormone replacement therapy cause increased risk of cancer in general?
� How unlikely would this be, just by random chance, if there were no difference between HRT and placebo regarding cancer?
HRT and All Cancer
� If there were no difference between HRT and placebo regarding cancer, we would see results this extreme about 24 out of 100 times, or about a quarter of the time.
� We do not have evidence that HRT increases risk of cancer in general.
Use data from a sample survey to estimate a population mean or proportion; develop a margin of
error through the use of simulation methods for random sampling
Use data from a randomized experiment to compare two
treatments; use simulation to decide if differences between
parameters are significant
✔
✔
Malevolent Uniforms
SampleCorrelation=0.43
Doteamswithmoremalevolentuniformscommit/getmorepenalties,oristherelationshipjustduetorandomchance?
Good Technology Options� StatKey: lock5stat.com/statkey� Rossman/Chance: rossmanchance.com/applets� InZight: stat.auckland.ac.nz/~wild/iNZight� R: cran.r-project.org� RStudio: rstudio.com� Fathom: fathom.concord.org� Tinkerplots: tinkerplots.com� JMP: jmp.com� StatCrunch: statcrunch.com
Red = Free
Simulation Methods• Intrinsically connected to concepts
• Minimal background knowledge needed
• More visual, less mathematical
• Same procedure applies to all statistics
• More generalizable
• Fewer conditions; conditions transparent
Fall '14 Spring '15 Fall '15
2040
6080
100
Pos
t Tes
t Sco
re
Scores on a National Assessment
Averages:
p-value:0.00002
43% 60% 63%National:47%
National Assessment ResultsMost improvement on p-value questions!
T-test Simulation National
Abletoreasonthatasmallerp-valueprovidesstrongerevidenceagainstthenullhypothesisthanalargerp-value.
29%
Abletoreasonaboutaconclusionbasedonastatisticallysignificantp-valueinthecontextofaresearchstudythatcomparestwogroups.
43%
45%
48%
68%
80%
Student Behavior• Studentsweregivendataonthesecondmidtermandaskedtocomputeaconfidenceintervalforthemean
• Howtheycreatedtheinterval:
Bootstrapping t.test inR Formula
94 9 984% 8% 8%
Itisthewayofthepast…
"Actually,thestatisticiandoesnotcarryoutthisverysimpleandverytediousprocess[therandomizationtest],buthisconclusionshavenojustificationbeyondthefactthattheyagreewiththosewhichcouldhavebeenarrivedatbythiselementarymethod."
-- SirR.A.Fisher,1936
…andthewayofthefuture“...theconsensuscurriculumisstillanunwittingprisonerofhistory.Whatweteachislargelythetechnicalmachineryofnumericalapproximationsbasedonthenormaldistributionanditsmanysubsidiarycogs.Thismachinerywasoncenecessary,becausetheconceptuallysimpleralternativebasedonpermutationswascomputationallybeyondourreach.Beforecomputersstatisticianshadnochoice.Thesedayswehavenoexcuse.Randomization-basedinferencemakesadirectconnectionbetweendataproductionandthelogicofinferencethatdeservestobeatthecoreofeveryintroductorycourse.”
-- ProfessorGeorgeCobb,2007
Want More?� Sessions, panel, and discussion on teaching
with simulation-based inference during the Electronic Conference on Teaching Statistics (eCOTS), May 16 – 20th, 2016: causeweb.org/cause/ecots/ecots16
� Contact me ([email protected]) if interested in a potential NSF-sponsored experience for high school teachers
� Simulation-based inference blog: causeweb.org/sbi