Statistics 111 - Lecture 11 Introduction to...
Click here to load reader
Transcript of Statistics 111 - Lecture 11 Introduction to...
1
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
1
Introduction to Inference
Sampling Distributions
Statistics 111 - Lecture 11
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
2
Administrative Notes
• Homework 3 due in recitation on Friday, Feb. 26
• Homework 4 will be posted sometime this week and covers Chapter 5 material which is on the midterm
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
3
Administrative Notes
• Midterm is Monday, Feb 29th (6-8pm) • Covers Chapters 1-5 in textbook • Bring ID cards to midterm! • Allowed: Calculators, single-sided 8.5 x 11 note sheet • List of additional textbook study problems has been posted
Stat 111 Lecture Last Name Midterm Exam Room 11am – 12pm A-D ANNENBERG HALL 111 11am – 12pm E-S ANNENBERG HALL 110 11am – 12pm T-Z COHEN HALL 402
2 – 3pm Everyone COHEN HALL G17
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
4 4
Inference with a Single Observation
• Each observation Xi in a random sample is a representative of unobserved variables in population
• How different would this observation be if we took a different random sample?
Population
Observation Xi
Parameter: µ
Sampling Inference
?
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
5
Normal Distribution • Model for our overall population
• Can calculate the probability of getting an observation greater than or less than any value
• Usually don’t have a single observation, but instead the mean of a set of observations
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
6
Inference with Sample Mean
• Sample mean is our estimate of population mean • How much would the sample mean change if we took
a different sample? • Key to this question: Sampling Distribution of
Population
Sample
Parameter: µ
Statistic:
Sampling Inference
Estimation
?
€
X
€
X
2
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
7
Sampling Distribution • Focused on models for continuous data: using the
sample mean as our estimate of population mean
• Sampling Distribution of the Sample Mean • how does the sample mean change over different samples?
Population Parameter: µ
Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n . .
€
X
€
X
€
X
€
X
€
X
€
X
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
8
Mean of Sample Mean • First, we examine the center of the sampling
distribution of the sample mean.
• Center of the sampling distribution of the sample mean is the unknown population mean:
• Over repeated samples, the sample mean will, on average, be equal to the population mean – no guarantees for any one sample!
€
mean X ( ) = µ
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
9
Variance of Sample Mean • Next, we examine the spread of the sampling
distribution of the sample mean
• The spread of the sampling distribution of the sample mean is
• As sample size increases, spread of the sample mean decreases! • Averaging over many observations is more accurate than
just looking at one or two observations €
VAR X ( ) = σ2
n
€
SD X ( ) = σn
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
10
• Comparing the sampling distribution of the sample mean when n = 1 vs. n = 10
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
11
Law of Large Numbers • If one draws independent samples from a
population with mean µ, then as the number of observations increases, the sample mean gets closer to the population mean µ
• This is easy to see since we know that €
X
€
mean X ( ) = µ
variance X ( ) = σ2
n→ 0 as n gets large
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
12
Example • Population: seasonal home-run totals for
7032 baseball players from 1901 to 1996 • Take different samples from this population and
compare the sample mean we get each time • In real life, we can’t do this because we don’t
usually have the entire population!
Sample Size 100 samples of size n = 1 3.69 6.84
100 samples of size n = 10 4.43 2.10
100 samples of size n = 100 4.42 0.66
100 samples of size n = 1000 4.42 0.24
Population Parameter µ = 4.42
€
mean X ( )
€
SD X ( )
3
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
13
Distribution of Sample Mean • We now know the center and spread of the
sampling distribution for the sample mean.
• What about the shape of the distribution?
• If our data x1,x2,…, xn follow a Normal distribution, then the sample mean will also follow a Normal distribution!
€
X
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
14
Example
• Mortality in US cities (deaths/100,000 people)
• This variable seems to approximately follow a Normal distribution, so the sample mean will also approximately follow a Normal distribution €
X = 940.3n = 60 750 800 850 900 950 1050 1150
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
15
Central Limit Theorem • What if the original data doesn’t follow a Normal
distribution? • HR/Season for sample of 100 baseball players
• If the sample is large enough, it doesn’t matter!
€
X = 5.35n =100 0 10 20 30 40 50
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
16
Central Limit Theorem • If the sample size is large enough, then the
sample mean has an approximately Normal distribution
• This is true no matter what the shape of the distribution of the original data!
€
X
€
SD X ( ) = σn
€
mean X ( ) = µ
€
Distribution of X :
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
17
Example: Home Runs per Season
• Take many different samples from the seasonal HR totals for a population of 7032 players • Calculate sample mean for each sample
• Distribution of Sample Means from different samples:
n = 1
n = 10
n = 100
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
18
Example: SAT test scores
• From 2007 data for entire population , SAT scores have a population mean of µ = 1150 and population s.d. of σ = 200
• What is the probability of a single student scoring over 1250 on SAT?
• Simple standardization:
• From Normal table: P(Z > 0.5) = 0.3085
• So, a single student has a 31% chance of scoring over 1250.
€
Z =X −µσ
=1250 −1150
200= 0.5
4
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
19
Example: SAT test scores
• What if, instead of a single student, we have a random sample of 25 students?
• What is the probability that the sample mean of our 25 students is over 1250 on SAT?
• Earlier this class: sample mean follows a normal distribution centered at with standard deviation:
€
X
€
µ
€
SD X ( ) = σn
€
X
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
20
Example: SAT test scores
• So when we do standardization for sample mean, we have to pay attention to standard deviation:
• From Normal table: P(Z > 2.5) = 0.0062
• So, there is only a 0.62 % chance of the sample mean of 25 students being higher than 1250
€
Z =X −µσ n
=1250 −1150200 / 25
= 2.5
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
21
Example: SAT test scores
• Probability of a single student having a SAT score greater than 1250 was 31%
• Probability of the sample mean of 25 SAT scores being greater than 1250 was 0.62%
• Why is the probability for the sample mean so much lower? Well, remember that 1250 is substantially higher than population mean of 1150
• Think about Law of Large Numbers: as the sample size grows, the sample mean becomes closer to population mean – Less chance of getting a substantially higher SAT score of
1250 with a larger sample!
Feb. 23, 2016 Stat 111 - Lecture 11 - Sampling Distributions
22
Next Class - Lecture 12
• Discrete data: sampling distribution for sample proportions
• Moore and McCabe: Section 5.1