More descripve stascs: Percen+les, boxplots,...
Transcript of More descripve stascs: Percen+les, boxplots,...
-
More descrip+ve sta+s+cs:
Percen+les, boxplots, and z-scores
-
Outline for today
Be#erknowaplayerWadeBoggsReview:• Centraltendencyandmeasuresofvaria
-
Be>er a player
WadeBoggsAnyques
-
Descrip+ve sta+s+cs
Whatisasta$s$c?Asta$s$cisanumericalsummary(func
-
The mean
Mean=x1+x2+x3+…+xn= Σxin n
Samplemean(x̅)vs.popula
-
The median
Themedianisthevalueinthemiddleofyourdata• ½ofthevaluesaregreaterthanthemedianand½areless
Themedianisresistanttooutlierswhenthemeanisnot
-
The standard devia+on
-
Mean
-
Mean±stdev
-
Large vs small standard devia+ons
Samemean,differentstandarddevia
-
The 95% rule (of thumb)
Ifadistribu
-
Percen+les
Thepthpercen$leisthevalueofaquan
-
Percen+les/quan+les
h#ps://emeyers.shinyapps.io/baseball_stat_percen
-
What is a good sta+s+c for…?
Usethewebsitetodeterminewhat“good”valuesareforthefollowingsta
-
PuOng sta+s+cs in context
90thpercen
-
PuOng sta+s+cs in context
90thpercen
-
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
-
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
-
Calcula+ng percen+les
Thepthpercen$leisthevalueofaquan
-
Order 1 2 3 4 5 6 7 8 9 10 11Sorted data 23 23 28 29 30 32 35 35 37 38 54Percen+le 0 10 20 30 40 50 60 70 80 90 100
Calcula+ng percen+les
Typicallyweaskforavaluethatisatthepthpercen
-
Five Number Summary
FiveNumberSummary=(min,Q1,median,Q3,max)Q1=25thpercen
-
Range and Interquar+le Range
Range=maximum–minimum
Interquar$lerange(IQR)=Q3–Q1
R: IQR(x)
-
Compute: 5 number summary, range, and IQR fro David Or+z home runs
1.FiveNumberSummary=(min,Q1,median,Q3,max)2.Range=maximum–minimum3.Interquar$lerange(IQR)=Q3–Q1
Alsousethepercen
-
5 number summary, range, and IQR fro David Or+z home runs
1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar$lerange(IQR)=7.5
The5numbersummaryforHRsforallplayer-seasonwithover500PAis:(0,4,10,20,73)
54 35 23 28 3229 23 30 35 3738
-
Detec+ng of outliers
Asaruleofthumb,wecalladatavalueanoutlierifitis:
Smallerthan:Q1-1.5*IQR
Largerthan:Q3+1.5*IQRArethereanyoutliersinDavidOr$zhomerunnumbers?
1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar
-
Boxplots
Aboxplotisagraphicaldisplayofthe5numbersummaryandconsistsof:
1.DrawingaboxfromQ1toQ3
2.Dividingtheboxwithalinedrawnatthemedian
3.Drawalinefromeachquar
-
Box plot of David Or+z home runs
R: boxplot(x)
Homeruns
-
Box plot quizHo
meruns
Whatis:• Q1?• Q3?• Themedian?• Mostextremevaluesthatarenotoutliers• Outliers
A
B
D
C
E
F
-
Two current players: who is best?
MiguelCabrera:HRin2014=25
DavidOr
-
Comparing players with side-by-side box plots
HowwouldyoudescribethedifferencesbetweenthesetwoplayersintermsofHRs?Whoisbe#er?
A B
-
Let’s compare two more players1985
WadeBoggs:BA=.368 TedWilliams:BA=.406
1941
Careerbestseasons Whoisbe#er?
-
Who is best here?
IsTedWilliamsbe#erthanWadeBoggs?
-
TedWilliamshit.406in194123plentyofpeoplehitover.400beforehimbutnoonehassince…
-
Have the best players go>en worse at hiOng over the past 140 years?
Maxbaf
ngaverage
Year
-
Comparing players across +me periods
Problem:baseballhaschangedfrom1871tonow
Wecan’tsimplycomparesta
-
Histograms of baOng average 1941 vs. 1985
Dothebafngaverageslooksimilarintheseyears?
-
Density of baOng average 1941 vs. 1985
Dothebafngaverageslooksimilarintheseyears?
-
z-scores
Thez-scorestellshowmanystandarddevia
-
z-scores for comparing players across eras
Whencomparingplayersacrosseras,wewillusethemean(x̄),andstandarddevia
-
Comparing Ted and Wade to their peers
In1941:• Meanbafngaveragewas:.276• Standarddevia
-
Comparing Ted and Wade to their peers
Wade’sbafngaveragez-score:3.82Ted’sbafngaveragez-score:3.97
Whoisthebe#erhi#er?
-
Career z-scored baOng averages
-
What about Home Runs…
-
Next class: correla+on!
QandR:BigDataBaseballchapter4