Statistics and Hypothesis · PDF file Important Notes on Hypothesis Testing I Summary of the...

Click here to load reader

  • date post

    03-Aug-2020
  • Category

    Documents

  • view

    7
  • download

    0

Embed Size (px)

Transcript of Statistics and Hypothesis · PDF file Important Notes on Hypothesis Testing I Summary of the...

  • Statistics and Hypothesis Testing

    Michael Ash

    Lecture 5

  • But first, let’s finish some material from last time.

  • Summary of Main Points

    I We will never know population parameters µY or σ 2

    Y , but we

    will use data from samples to compute estimates Y and s 2 Y

    and statistical theory to judge the quality of the estimates for addressing real-world questions.

    I Statistical knowledge is couched around hypothesis testing: will the sample data permit us to accept a maintained hypothesis or to reject it in favor of an alternative?

    I We will use the sample mean to test hypotheses about the population mean. What are some conequences of the focus on the mean?

  • What is an estimator?

    An estimator is a method of guessing a population parameter, for example, the population mean µY , using a sample of data. The estimate is the specific numerical guess that an estimator yields. Estimators of a parameter are indicated by inserting aˆ . For example, guesses about the population mean are labeled µ̂Y

  • What would be a good way to guess µY ?

    One method of guessing the population mean is to take a sample and compute the sample mean: Y =

    ∑ n

    i=1

    1

    n Yi . Another way

    would be to take whatever observation happens to come first on the list, Y1 (or last Yn).

  • Why is Y a good estimator of µY ?

    Unbiased We showed last time that E (Y ) = µY (“on average, the sample mean is equal to the population mean”), which is the definition of unbiasedness. (Note, however, that unbiasedness also holds for using the first observation Y1 as the estimator of µY : “On average, the first observation is equal to the population mean.”)

    Consistent Y becomes closer and closer to (a better and better estimate of) µY as the sample size grows. (Note, by the way, that Y1 does not become a better estimate of µY as the sample size grows.)

    Most efficient Y has variance σ

    2

    Y

    n , which turns out to be the lowest

    possible variance among unbiased estimators of µY . (Note that Y1 has variance σ

    2

    Y , which is terrible by comparison.)

    The book demonstrates that Y , which equally weights each observations in

    ∑n i=1

    1

    n Yi , has lower variance than

    alternative weightings of the data. Alternative weighted averages of all the data are better (lower variance) than Y1 but not as good as Y

  • Y is the least squares estimator of µY Suppose that the data Y1,Y2, . . . ,Yn are spread along the number line, and you can make one guess, m, about where to put an estimate of µY .

    Y

    Y

    i

    i

    The criterion for judging the guess will be to make n∑

    i=1

    (Yi − m)2

    as small as possible. (Translation: square the gap between each observations Yi and the guess and add up the sum of squared gaps.) If the guess m is too high, then the small values of Yi will make the sum of squared gaps get big. If the guess m is too low, then the big values of Yi will make the sum of squares gaps get big. If m is just right, then the sum of squared gaps will be as small as possible.

  • Y is the least squares estimator of µY

    It turns out that Y , the sample mean, is the best guess (the guess that makes the sum of squared gaps as small as possible). Choosing m = Y makes the following expression as small as possible.

    n∑

    i=1

    (Yi − m)2

    We will use this method (keeping the sum of squared gaps as low as possible) for defining the best guess again soon.

  • Random Sampling The consequences of non-random sampling

    1. Non-random samples I Convenience samples and how hard it is to avoid them

    I The Literary Gazette, Landon, and Roosevelt.

    I Nonresponse bias I Attrition bias I Purposive sampling (e.g., for qualitative research)

    2. High quality surveys I Current Population Survey

  • Hypothesis Testing

    With statistical methods, we can test hypotheses about population parameters, e.g., the population mean. For example: Does the population mean of hour earnings equal $20 per hour?

    I Define the null hypothesis

    H0 : E (Y ) = µY ,0

    H0 : E (hourly earnings) = $20 per hour

    I Define the alternative hypothesis.

    H1 : E (Y ) 6= µY ,0 H1 : E (hourly earnings) 6= $20 per hour

    I Gather a sample of data and compute the actual sample mean

  • Hypothesis Testing

    1. Gather a sample of data and compute the actual sample mean

    2. If the null hypothesis were true, would the r.v. the sample mean likely be as big (or small) as the actual sample mean?

    Pr H0

    [∣∣Y − µY ,0 ∣∣ >

    ∣∣∣Y act − µY ,0 ∣∣∣ ]

    (There is only one random variable in the preceding mathematical phrase. Can you find it?)

    2.1 If so (the probability is large), “accept the null hypothesis” (which does not mean that the null hypothesis is true, simply that the data do not reject it).

    2.2 If not (the probability is small), “reject the null hypothesis” in favor of the alternative.

  • Important Notes on Hypothesis Testing

    I Summary of the hypothesis-testing approach 1. The null hypothesis is a hypothesis about the population mean. 2. The null hypothesis (and the size of the sample) implies a

    distribution of the sample mean. 3. An actual sample of real-world data gives an actual value of

    the sample mean. 4. The test of the null hypothesis asks if the actual value of

    the sample mean is likely under the implied distribution of the sample mean (likely if the null hypothesis is true).

    I We learn about the population mean. For example, if we learn that E (hourly earnings) > $20 per hour, this does not mean that every worker earns more than $20 per hour! Do not confuse the mean with the entire distribution.

    I Do not confuse statistical significance and practical significance. With a large enough sample, you can distinguish an hypothesized mean of $20 per hour from an hypothesized mean of $20.07 per hour. Does anyone care? More on this later.

  • The p-Value

    p-value ≡ Pr H0

    [∣∣Y − µY ,0 ∣∣ >

    ∣∣∣Y act − µY ,0 ∣∣∣ ]

    This phrase expresses how likely the observed, actual sample mean Y

    act would be to deviate from the null-hypothesized population

    mean µY ,0 if the null hypothesis were true. Why can it deviate at all (if the null hypothesis is true)? Sampling variation. But if the actual sample mean deviates “too much” from the null-hypothesized population mean, then sampling variation becomes an unlikely reason for the difference.

  • Defining “too much.”

    We know that under the null hypothesis, the sample mean is a random variable distributed in a particular way: N(µY ,0, σ

    2

    Y ).

    Because this is a normal distribution, we know exactly the probability that the sample mean is more than any specified distance away from the hypothesized mean (if the hypothesized mean is accurate). For example, it is less than 5 percent likely that the sample mean will be more than 2 (really 1.96) standard deviations away from the hypothesized mean (if the hypothesized mean is accurate).

  • How likely is the observed value?

    In words, the p-value is how likely the random variable Y is to exceed the observed actual Y

    act if the null hypothesis is true.

    As p falls, we become increasingly sure that the null hypothesis is not true. (It’s really unlikely that we could have a sample mean this big (small) if the null were true. We do have a sample mean this big (small). Ergo, the null hypothesis is not true.) Convert to a standard normal problem

  • Convert to a standard normal problem

    p-value = Pr H0

    [∣∣∣∣∣ Y − µY ,0

    σ Y

    ∣∣∣∣∣ > ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ]

    = Pr H0

    [ |Z | >

    ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ]

    = Pr H0

    [ Z < −

    ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ] + Pr

    H0

    [ Z >

    ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ]

    = Pr H0

    [ Z < −

    ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ] + Pr

    H0

    [ Z < −

    ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ]

    = ΦH0

    ( − ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    ) + ΦH0

    ( − ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    )

    = 2ΦH0

    ( − ∣∣∣∣∣ Y

    act − µY ,0 σ

    Y

    ∣∣∣∣∣

    )

  • Sample Variance, Sample Standard Deviation, Standard

    Error

    Why?

    I The sample variance and sample standard deviation are interesting in their own right as a description of the spread in the data. Is income equally or unequally distributed? Do winters vary from year to year?

    I As we can estimate the population mean using the sample mean, we can estimate the population variance and standard deviation using the sample variance and