More descripve stascs: Percen+les, boxplots,...

45
More descrip+ve sta+s+cs: Percen+les, boxplots, and z-scores

Transcript of More descripve stascs: Percen+les, boxplots,...

  • More descrip+ve sta+s+cs:

    Percen+les, boxplots, and z-scores

  • Outline for today

    Be#erknowaplayerWadeBoggsReview:•  Centraltendencyandmeasuresofvaria

  • Be>er a player

    WadeBoggsAnyques

  • Descrip+ve sta+s+cs

    Whatisasta$s$c?Asta$s$cisanumericalsummary(func

  • The mean

    Mean=x1+x2+x3+…+xn= Σxin n

    Samplemean(x̅)vs.popula

  • The median

    Themedianisthevalueinthemiddleofyourdata•  ½ofthevaluesaregreaterthanthemedianand½areless

    Themedianisresistanttooutlierswhenthemeanisnot

  • The standard devia+on

  • Mean

  • Mean±stdev

  • Large vs small standard devia+ons

    Samemean,differentstandarddevia

  • The 95% rule (of thumb)

    Ifadistribu

  • Percen+les

    Thepthpercen$leisthevalueofaquan

  • Percen+les/quan+les

    h#ps://emeyers.shinyapps.io/baseball_stat_percen

  • What is a good sta+s+c for…?

    Usethewebsitetodeterminewhat“good”valuesareforthefollowingsta

  • PuOng sta+s+cs in context

    90thpercen

  • PuOng sta+s+cs in context

    90thpercen

  • Calcula+ng percen+les

    Thepthpercen$leisthevalueofaquan

  • Calcula+ng percen+les

    Thepthpercen$leisthevalueofaquan

  • Calcula+ng percen+les

    Thepthpercen$leisthevalueofaquan

  • Order 1 2 3 4 5 6 7 8 9 10 11Sorted data 23 23 28 29 30 32 35 35 37 38 54Percen+le 0 10 20 30 40 50 60 70 80 90 100

    Calcula+ng percen+les

    Typicallyweaskforavaluethatisatthepthpercen

  • Five Number Summary

    FiveNumberSummary=(min,Q1,median,Q3,max)Q1=25thpercen

  • Range and Interquar+le Range

    Range=maximum–minimum

    Interquar$lerange(IQR)=Q3–Q1

    R: IQR(x)

  • Compute: 5 number summary, range, and IQR fro David Or+z home runs

    1.FiveNumberSummary=(min,Q1,median,Q3,max)2.Range=maximum–minimum3.Interquar$lerange(IQR)=Q3–Q1

    Alsousethepercen

  • 5 number summary, range, and IQR fro David Or+z home runs

    1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar$lerange(IQR)=7.5

    The5numbersummaryforHRsforallplayer-seasonwithover500PAis:(0,4,10,20,73)

    54 35 23 28 3229 23 30 35 3738

  • Detec+ng of outliers

    Asaruleofthumb,wecalladatavalueanoutlierifitis:

    Smallerthan:Q1-1.5*IQR

    Largerthan:Q3+1.5*IQRArethereanyoutliersinDavidOr$zhomerunnumbers?

    1.FiveNumberSummary:(23,28.5,32,36,54)2.Range:313.Interquar

  • Boxplots

    Aboxplotisagraphicaldisplayofthe5numbersummaryandconsistsof:

    1.DrawingaboxfromQ1toQ3

    2.Dividingtheboxwithalinedrawnatthemedian

    3.Drawalinefromeachquar

  • Box plot of David Or+z home runs

    R: boxplot(x)

    Homeruns

  • Box plot quizHo

    meruns

    Whatis:•  Q1?•  Q3?•  Themedian?•  Mostextremevaluesthatarenotoutliers•  Outliers

    A

    B

    D

    C

    E

    F

  • Two current players: who is best?

    MiguelCabrera:HRin2014=25

    DavidOr

  • Comparing players with side-by-side box plots

    HowwouldyoudescribethedifferencesbetweenthesetwoplayersintermsofHRs?Whoisbe#er?

    A B

  • Let’s compare two more players1985

    WadeBoggs:BA=.368 TedWilliams:BA=.406

    1941

    Careerbestseasons Whoisbe#er?

  • Who is best here?

    IsTedWilliamsbe#erthanWadeBoggs?

  • TedWilliamshit.406in194123plentyofpeoplehitover.400beforehimbutnoonehassince…

  • Have the best players go>en worse at hiOng over the past 140 years?

    Maxbaf

    ngaverage

    Year

  • Comparing players across +me periods

    Problem:baseballhaschangedfrom1871tonow

    Wecan’tsimplycomparesta

  • Histograms of baOng average 1941 vs. 1985

    Dothebafngaverageslooksimilarintheseyears?

  • Density of baOng average 1941 vs. 1985

    Dothebafngaverageslooksimilarintheseyears?

  • z-scores

    Thez-scorestellshowmanystandarddevia

  • z-scores for comparing players across eras

    Whencomparingplayersacrosseras,wewillusethemean(x̄),andstandarddevia

  • Comparing Ted and Wade to their peers

    In1941:•  Meanbafngaveragewas:.276•  Standarddevia

  • Comparing Ted and Wade to their peers

    Wade’sbafngaveragez-score:3.82Ted’sbafngaveragez-score:3.97

    Whoisthebe#erhi#er?

  • Career z-scored baOng averages

  • What about Home Runs…

  • Next class: correla+on!

    QandR:BigDataBaseballchapter4