Statistics 111 - Lecture 11 Introduction to...

4

Click here to load reader

Transcript of Statistics 111 - Lecture 11 Introduction to...

Page 1: Statistics 111 - Lecture 11 Introduction to Inferencestat.wharton.upenn.edu/~stjensen/stat111/lecture11.handout.pdfIntroduction to Inference Sampling Distributions Statistics 111 -

1

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

1

Introduction to Inference

Sampling Distributions

Statistics 111 - Lecture 11

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

2

Administrative Notes

•  Homework 3 due in recitation on Friday, Feb. 26

•  Homework 4 will be posted sometime this week and covers Chapter 5 material which is on the midterm

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

3

Administrative Notes

•  Midterm is Monday, Feb 29th (6-8pm) •  Covers Chapters 1-5 in textbook •  Bring ID cards to midterm! •  Allowed: Calculators, single-sided 8.5 x 11 note sheet •  List of additional textbook study problems has been posted

Stat 111 Lecture Last Name Midterm Exam Room 11am – 12pm A-D ANNENBERG HALL 111 11am – 12pm E-S ANNENBERG HALL 110 11am – 12pm T-Z COHEN HALL 402

2 – 3pm Everyone COHEN HALL G17

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

4 4

Inference with a Single Observation

•  Each observation Xi in a random sample is a representative of unobserved variables in population

•  How different would this observation be if we took a different random sample?

Population

Observation Xi

Parameter: µ

Sampling Inference

?

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

5

Normal Distribution •  Model for our overall population

•  Can calculate the probability of getting an observation greater than or less than any value

•  Usually don’t have a single observation, but instead the mean of a set of observations

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

6

Inference with Sample Mean

•  Sample mean is our estimate of population mean •  How much would the sample mean change if we took

a different sample? •  Key to this question: Sampling Distribution of

Population

Sample

Parameter: µ

Statistic:

Sampling Inference

Estimation

?

X

X

Page 2: Statistics 111 - Lecture 11 Introduction to Inferencestat.wharton.upenn.edu/~stjensen/stat111/lecture11.handout.pdfIntroduction to Inference Sampling Distributions Statistics 111 -

2

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

7

Sampling Distribution •  Focused on models for continuous data: using the

sample mean as our estimate of population mean

•  Sampling Distribution of the Sample Mean •  how does the sample mean change over different samples?

Population Parameter: µ

Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n . .

X

X

X

X

X

X

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

8

Mean of Sample Mean •  First, we examine the center of the sampling

distribution of the sample mean.

•  Center of the sampling distribution of the sample mean is the unknown population mean:

•  Over repeated samples, the sample mean will, on average, be equal to the population mean –  no guarantees for any one sample!

mean X ( ) = µ

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

9

Variance of Sample Mean •  Next, we examine the spread of the sampling

distribution of the sample mean

•  The spread of the sampling distribution of the sample mean is

•  As sample size increases, spread of the sample mean decreases! •  Averaging over many observations is more accurate than

just looking at one or two observations €

VAR X ( ) = σ2

n

SD X ( ) = σn

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

10

•  Comparing the sampling distribution of the sample mean when n = 1 vs. n = 10

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

11

Law of Large Numbers •  If one draws independent samples from a

population with mean µ, then as the number of observations increases, the sample mean gets closer to the population mean µ

•  This is easy to see since we know that €

X

mean X ( ) = µ

variance X ( ) = σ2

n→ 0 as n gets large

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

12

Example •  Population: seasonal home-run totals for

7032 baseball players from 1901 to 1996 •  Take different samples from this population and

compare the sample mean we get each time •  In real life, we can’t do this because we don’t

usually have the entire population!

Sample Size 100 samples of size n = 1 3.69 6.84

100 samples of size n = 10 4.43 2.10

100 samples of size n = 100 4.42 0.66

100 samples of size n = 1000 4.42 0.24

Population Parameter µ = 4.42

mean X ( )

SD X ( )

Page 3: Statistics 111 - Lecture 11 Introduction to Inferencestat.wharton.upenn.edu/~stjensen/stat111/lecture11.handout.pdfIntroduction to Inference Sampling Distributions Statistics 111 -

3

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

13

Distribution of Sample Mean •  We now know the center and spread of the

sampling distribution for the sample mean.

•  What about the shape of the distribution?

•  If our data x1,x2,…, xn follow a Normal distribution, then the sample mean will also follow a Normal distribution!

X

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

14

Example

•  Mortality in US cities (deaths/100,000 people)

•  This variable seems to approximately follow a Normal distribution, so the sample mean will also approximately follow a Normal distribution €

X = 940.3n = 60 750 800 850 900 950 1050 1150

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

15

Central Limit Theorem •  What if the original data doesn’t follow a Normal

distribution? •  HR/Season for sample of 100 baseball players

•  If the sample is large enough, it doesn’t matter!

X = 5.35n =100 0 10 20 30 40 50

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

16

Central Limit Theorem •  If the sample size is large enough, then the

sample mean has an approximately Normal distribution

•  This is true no matter what the shape of the distribution of the original data!

X

SD X ( ) = σn

mean X ( ) = µ

Distribution of X :

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

17

Example: Home Runs per Season

•  Take many different samples from the seasonal HR totals for a population of 7032 players •  Calculate sample mean for each sample

•  Distribution of Sample Means from different samples:

n = 1

n = 10

n = 100

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

18

Example: SAT test scores

•  From 2007 data for entire population , SAT scores have a population mean of µ = 1150 and population s.d. of σ = 200

•  What is the probability of a single student scoring over 1250 on SAT?

•  Simple standardization:

•  From Normal table: P(Z > 0.5) = 0.3085

•  So, a single student has a 31% chance of scoring over 1250.

Z =X −µσ

=1250 −1150

200= 0.5

Page 4: Statistics 111 - Lecture 11 Introduction to Inferencestat.wharton.upenn.edu/~stjensen/stat111/lecture11.handout.pdfIntroduction to Inference Sampling Distributions Statistics 111 -

4

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

19

Example: SAT test scores

•  What if, instead of a single student, we have a random sample of 25 students?

•  What is the probability that the sample mean of our 25 students is over 1250 on SAT?

•  Earlier this class: sample mean follows a normal distribution centered at with standard deviation:

X

µ

SD X ( ) = σn

X

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

20

Example: SAT test scores

•  So when we do standardization for sample mean, we have to pay attention to standard deviation:

•  From Normal table: P(Z > 2.5) = 0.0062

•  So, there is only a 0.62 % chance of the sample mean of 25 students being higher than 1250

Z =X −µσ n

=1250 −1150200 / 25

= 2.5

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

21

Example: SAT test scores

•  Probability of a single student having a SAT score greater than 1250 was 31%

•  Probability of the sample mean of 25 SAT scores being greater than 1250 was 0.62%

•  Why is the probability for the sample mean so much lower? Well, remember that 1250 is substantially higher than population mean of 1150

•  Think about Law of Large Numbers: as the sample size grows, the sample mean becomes closer to population mean –  Less chance of getting a substantially higher SAT score of

1250 with a larger sample!

Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions

22

Next Class - Lecture 12

•  Discrete data: sampling distribution for sample proportions

•  Moore and McCabe: Section 5.1