Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2...

27
Measures of Central Tendency& Variability

Transcript of Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2...

Page 1: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Measures of Central

Tendency&Variability

Page 2: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Di Fara Gino’sDi Fara124344341332532

Gino’s232322444324324

Σ 44 Σ 44/15 /15

= 2.93 = 2.93

Page 3: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

What differences can we see in the distributions of these ratings?

Page 4: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Measures of Central Tendency and Variability

So far, we have used very basic characterizations of distributions

We need a way to characterize these same distributions quantitatively (using numbers). This allows us to compare distributions.

We can describe distributions using two categories of measures:Measures of Central Tendency• mean, median, mode

Measures of Variability• range, standard deviation, variance

• Number of modes (unimodal, bimodal, multimodal)• Skew (positive or negative) & Symmetry

Page 5: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Measures of Central Tendency(where all the action is)

Mean- The average of all the scores. The sum of all the scores divided by the number of scores.

Example: x : {1, 3, 4, 8 }

The mean is denoted differently depending on the type of data from which it comes:

Population mean = μ (pronounced “myou”)

Σ x (1 + 3 + 4 + 8) 16 N 4 4

= = = 4

m

Sample mean = x (spoken as “x-bar”)__

Page 6: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Median – The score that falls in the exact middle of the distribution. (Half the scores are lower and half higher than the median.

x = {5, 6, 2, 3, 1, 9, 8, 0, 2, 4, 5}

First, arrange the numbers in ascending order:

x = {0, 1, 2, 2, 3, 4, 5, 5, 6, 8, 9}

Find the number that falls in the middle.

For an even number of scores, average the two middle numbers. x = {0, 1, 1, 2, 2, 3, 4, 5, 5, 6, 8, 9}

The median is the “middle score”

Page 7: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Mode – The score that occurs most frequently. The score with the highest FREQUENCY.

Example: 1, 3, 1, 5, 2, 1, 1, 8, 2, 3, 1, 1, 1, 0, 1, 3, 2, 1, 1, 1

Page 8: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Mode – The score that occurs most frequently. The score with the highest FREQUENCY.

Example: 1, 3, 1, 5, 2, 1, 1, 8, 2, 3, 1, 1, 1, 0, 1, 3, 2, 1, 1, 1

Page 9: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Relations between measures of central tendency describe score distribution shape : Skewness

When the mean, median, and mode agree, you have symmetry.

Pos Skew: Mean > Median Pos Skew: Mean < Median

Page 10: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Review of Summation:x: {1, 0, 3}

Sx = 1 + 0 + 3 = 4

Sx2 = 1 + 0 + 9 = 10

(Sx)2 = (1 + 0 + 3)2

3S x = 3(1) + (3)0 + (3)3

= 42 = 16

= 3 + 0 + 9 = 12

y: {2, 5, 1}

S xy = 1(2) + (0)5 + (3)1= 2 + 0 + 3 = 5(Sx)(Sy) = (1+0+3)(2+5+1)= (4)(8) = 32

x

103

x y

1 20 53 1

Page 11: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Measures of variability: (how clustered or spread out the distribution is)

The Normal Distribution

0

0.005

0.01

0.015

0 32 64 96 128 160 192 224X

Rel

ativ

e Fr

equ

ency

Range - The maximum difference in the data (Max-Min score)Standard Deviation -The average amount that the scores deviate from the mean.Variance - Similar to the standard deviation but with special properties.

Page 12: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

The RangeCanolli Eating Contest

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14 16 18 20 22

# Canolli Eaten

# C

onte

stan

ts

Contestant # Canoli Eaten1 42 53 64 65 76 87 88 99 1010 1011 1012 1013 1114 1115 1116 1217 1218 1419 1420 1421 1622 1623 21

Minimum = 4 Maximum = 21

Range = Maximum - Minimum= 21 - 4= 17

Page 13: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Standard Deviation: example

How much doeseach score in thesample differ from the averagescore?

The amount by which each score differs from the meanis called its deviation.

Canolli Eating Contest

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14 16 18 20 22

# Canolli Eaten

# C

onte

stan

ts

Page 14: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Standard Deviation (population)

x: { 1, 2, 3, 2}

Raw vs. Deviation Scores

x1232

μ2222

x – μ-1010

(x – μ)2

1010

How do you suppose we would go about finding the AVERAGEamount by which each score DEVIATES from the mean?

Σ(x-μ)2

_______ N√

= .7071 = .71

]- Sum of squares (SS)

√SS N

Σ(x-μ)2

= √ 2 4

=

s = .71“deviation method”

Page 15: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Standard Deviation (sample)

x: { 1, 2, 3, 2}

x1232

= .8165 = .82

]- Sum of squares (SS)

√ SS N-1

= 2 3√=

s = .82

x2222

_x – x

-1010

_(x – x)2

1010

_Σ(x-x)2

_

_______ N-1√ Σ(x-x)2

_

“deviation method”

Page 16: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

The “raw scores method” is an easier way to calculate the Sum of Squares (SS)

Remember, s = √SS N s = √ SS

N-1

SS = Sx2

(Sx)2

N__ __

“raw scores method”

Page 17: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

SS = Sx2

(Sx)2

N__ __x

1232

_________

Finding the standard deviation using the “raw scores method” for finding the Sum of Squares (SS)

SS = 18

(8)2

4__ __

x2

1494

Sx = Sx2 =

818

SS = 2

Page 18: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

SS = Sx2

(Sx)2

N__ __x

1232

_________

Finding the standard deviation using the “raw scores method” for finding the Sum of Squares (SS)

SS = 18

(8)2

4__ __

x2

1494

Sx = Sx2 =

818

SS = 2

Remember:

sSS

N 1

SS

N

POPULATION: SAMPLE:

√ 2 4s = √ 2

3s =

s = .71 s = .82

Page 19: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Summary Slide for Standard Deviation

SS

N

sSS

N 1

POPULATION:

SAMPLE:

x 2 x 2N

2x

x 2 x 2N

x x 2

Page 20: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

A family of statistics to describe populations and samples

Population Sample

Central Tendency

- mean m = (Sx)/N

- median - mode

- mean x = (Sx)/N

- same- same

Variability - range- Std Dev. s = √(SS/N)- Variance s2

- same- Std Dev. s = √(SS/N-1)- Variance s2

_

Page 21: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Revisiting Pizza…

s = 1.16 s = .88

Page 22: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

The Normal Distribution and Z-scores

Page 23: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Are all unimodal, symmetrical distributions normal? NO.

Kurtosis

Page 24: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

The Normal Distribution and Z-scoresWhat did you get on your SATs?• Prior to 2005, the highest possible score was 1600• In 2005, an additional section was added to the SAT, making the highest possible score a 2400

If my score (I took the SATs in 2002) was a 1400, and my friend’s score (2006) was an 1800, did my friend do better than I did or not?

We need to find a way to compare scores from differentdistributions. We cannot compare the raw scores directly.

Page 25: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

If we know that the particular variable on which our score was measured is NORMALLY distributed:

• we can specify HOW MANY standard deviations our score is above or below the mean.

• For example: We read on Princeton Review’s website that SAT scores are normally distributed. Using the old scale of measurement, the population mean SAT score was 1000, with a standard deviation of 150 points.

600 800 1000 1200 1400

. . . . .

m = 1000s = 150

How many standarddeviations away from the mean is a score of 1300?

Page 26: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

Measures how extreme or unusual a score is within a population

*in units of standard deviation.(this means it tells us exactly HOW MANY standard deviations a score is from the mean).

600 800 1000 1200 1400

z = for population

z = for sample

x - m s

x - x s

What about a score of 1325? How many standard deviations is itfrom the mean?

13251325 – 1000150

= 325 150

= 2.166= 2.17

The Z-score

Page 27: Measures of Central Tendency& Variability. Di FaraGino’s Di Fara 1 2 4 3 4 3 4 1 3 2 5 3 2 Gino’s 2 3 2 3 2 4 3 2 4 3 2 4 Σ 44 /15 = 2.93.

The Z-score

Example: MY SAT score (1400)

Population of SAT scores (old grading system):

m 1000 ptss 150 pts

1400 – 1000 400z = 150 150

z = 2.6666 = 2.67 standard deviations

above the mean

=

MY friend’s SAT score (1800)

Population of SAT scores (new grading system):

m 1500 ptss 200 pts

1800 – 1500 300z = 200 200

z = 1.5000 = 1.50 standard deviations

above the mean

=