Chi-Square &F Distributions - University of Illinois at Urbana

of 55 /55
Chi-Square & F Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Chi-Square & F Distributions – p. 1/55

Embed Size (px)

Transcript of Chi-Square &F Distributions - University of Illinois at Urbana

EdPsych 580
Fall 2005
Chi-Square & F Distributions . . . and Inferences about Variances
• The Chi-square Distribution • Definition, properties, tables of, density calculator
• Testing hypotheses about the variance of a single population (i.e., Ho : σ2 = K). Example.
• The F Distribution • Definition, important properties, tables of
• Testing the equality of variances of two independent populations (i.e., Ho : σ2
1 = σ2 2). Example.
Chi-Square & F Distributions
. . . and Inferences about Variances • Comments regarding testing the homogeneity
of variance assumption of the two independent groups t–test (and ANOVA).
• Relationship among the Normal, t, χ2, and F distributions.
Chi-Square & F Distributions – p. 3/55
Chi-Square & F Distributions • Motivation. The normal and t distributions are
useful for tests of population means, but often we may want to make inferences about population variances.
• Examples: • Does the variance equal a particular value?
• Does the variance in one population equal the variance in another population?
• Are individual differences greater in one population than another population?
• Are the variances in J populations all the same?
• Is the assumption of homogeneous variances reasonable when doing a t–test (or ANOVA) of two
Chi-Square & F Distributions – p. 4/55
Chi-Square & F Distributions • To make statistical inferences about
populations variance(s), we need • χ2 −→ The Chi-square distribution (Greek
“chi”). • F−→ Named after Sir Ronald Fisher who
developed the main applications of F .
• The χ2 and F–distributions are used for many problems in addition to the ones listed above.
• They provide good approximations to a large class of sampling distributions that are not easily determined.
Chi-Square & F Distributions – p. 5/55
The Big Five Theoretical Distributions
• The Big Five are Normal, Student’s t, χ2, F , and the Binomial (π, n).
• Plan: • Introduce χ2 and then the F distributions. • Illustrate their uses for testing variances. • Summarize and describe the relationship
among the Normal, Student’s t, χ2 and F .
Chi-Square & F Distributions – p. 6/55
The Chi-Square Distributions • Suppose we have a population with scores Y
that are normally distributed with mean E(Y ) = µ and variance = var(Y ) = σ2 (i.e., Y ∼ N (µ, σ2)).
• If we repeatedly take samples of size n = 1 and for each “sample” compute
z2 = (Y − µ)2
look like? Chi-Square & F Distributions – p. 7/55
The Chi-Square Distribution, ν = 1
Chi-Square & F Distributions – p. 8/55
The Chi-Square Distribution, ν = 1
• χ2 1 are non-negative Real numbers
• Since 68% of values from N (0, 1) fall between −1 to 1, 68% of values from χ2
1 distribution must be between 0 and 1.
• The chi-square distribution with ν = 1 is very skewed.
Chi-Square & F Distributions – p. 9/55
The Chi-Square Distribution, ν = 2
• Repeatedly draw independent (random) samples of n = 2 from N (µ, σ2).
• Compute Z2 1 = (X1 − µ)2/σ2 and
Z2 2 = (X2 − µ)2/σ2.
1 + Z2 2 .•
• All value non-negative
• A little less skewed than χ2 1.
• The probability that χ2 2 falls in the range of 0
to 1 is smaller relative to that for χ2 1. . .
P (χ2 1 ≤ 1) = .68
P (χ2 2 ≤ 1) = .39
• Note that mean ≈ ν = 2. . . .
Chi-Square & F Distributions – p. 11/55
Chi-Square Distributions • Generalize: For n independent observations
from a N (µ, σ2), the sum of the squared standard scores has a Chi-square distribution with n degrees of freedom.
• Chi–squared distribution only depends on degrees of freedom, which in turn depends on sample size n.
• The standard scores are computed using population µ and σ2; however, we usually don’t know what µ and σ2 equal. When µ and σ2 are estimated from the sampled data, the degrees of freedom are less than n.
Chi-Square & F Distributions – p. 12/55
Chi-Square Dist: Varying ν
Chi-Square & F Distributions – p. 13/55
Properties of Family of χ2 Distributions • They are all positively skewed. • As ν gets larger, the degree of skew
decreases. • As ν gets very large, χ2
ν approaches the normal distribution.
Why? The Central Limit Theorem (for sums): Consider a random sample of size n from a population distribution having mean µ and variance σ2. If n is sufficiently large, then the distribution of
∑n i=1 Yi is
approximately normal with mean nµ and variance σ2.
Chi-Square & F Distributions – p. 14/55
Properties of Family of χ2 Distributions
• E(χ2 ν) = mean = ν = degrees of freedom.
• E[(χ2 ν − E(χ2
ν) = 2ν.
• Mode of χ2 ν is at value ν − 2 (for ν ≥ 2).
• Median is approximately = (3ν − 2)/3 (for ν ≥ 2).
Chi-Square & F Distributions – p. 15/55
Properties of Family of χ2 Distributions IF
• A random variable χ2 ν1
has a chi-squared distribution with ν1 degrees of freedom, and
• A second independent random variable χ2 ν2
has a chi-squared distribution with ν2 degrees of freedom,
THEN χ2
ν2
their sum has a chi-squared distribution with (ν1 + ν2) degrees of freedom.
Chi-Square & F Distributions – p. 16/55
Percentiles ofχ2 Distributions Note: .95χ
2 1 = 3.84 = 1.962 = z2
.95
pvalue.exe, on the course web-site. • SAS: PROBCHI(x,df<,nc>) where
• x = number • df = degrees of freedom • If p=PROBCHI(x,df), then
p = Prob(χ2 df ≤ x)
SAS Examples & Computations p-values: DATA probval; pz=PROBNORM(1.96); pzsq=PROBCHI(3.84,1); output;
RUN; Output:
SAS Examples & Computations
. . . To get density values. . .
Probability Density; data chisq3; do x=0 to 10 by .005;
pdfxsq=pdf(’CHISQUARE’,x,3); output;
end; run;
Chi-Square & F Distributions – p. 19/55
Inferences about a Population Variance
or the sampling distribution of the sample variance from a normal population.
• Statistical Hypotheses:
o
Yi ∼ N (µ, σ2) i.i.d
Chi-Square & F Distributions – p. 20/55
Inferences aboutσ2 (continued)
if z ∼ N (0, 1).

σ2 =
Test Statistic for Ho : σ2 = σ2 o
• So s2 ∼ σ2
(n − 1) χ2
X 2 ν =
σ2 o
• Sampling distribution of Test Statistic: If Ho is
true, which means that σ2 = σ2 o , then
X 2 ν =
σ2 o
Decision and Conclusion,Ho : σ2 = σ2 o
• Decision: Compare the obtained test statistic to the chi-squared distribution with ν = n − 1 degrees of freedom.
or find the p-value of the test statistic and compare to α.
• Interpretation/Conclusion: What does the decision mean in terms of what you’re investigating?
Chi-Square & F Distributions – p. 23/55
Example ofHo : σ2 = σ2 o
• High School and Beyond: Is the variance of math scores of students from private schools equal to 100?
• Statistical Hypotheses:
Ho : σ2 = 100 versus Ha : σ2 6= 100
• Assumptions: Math scores are independent and normally distributed in the population of high school seniors who attend private schools and the observations are independent.
Chi-Square & F Distributions – p. 24/55
Example ofHo : σ2 = σ2 o (continued)
• Test Statistic: n = 94, s2 = 67.16, and set α = .10.
X 2 = (n − 1)s2
(94 − 1)(67.16)
100 = 62.46
with ν = (94 − 1) = 93. • Sampling Distribution of the Test Statistic:
Chi-square with ν = 93.
2 93 = 116.51.
Example ofHo : σ2 = σ2 o (continued)
• Critical values: .05χ 2 93 = 71.76 & .95χ
2 93 = 116.51.

• Decision: Since the obtained test statistic X 2 = 71.76 is less than .05χ
2 93 = 116.51, reject Ho at α = .10.
Chi-Square & F Distributions – p. 26/55
Confidence Interval Estimate ofσ2
σ2 ≤ (1−α/2)χ
• (1 − α)% Confidence interval,
which does not include 100 (the null hypothesized value).
• s2 = 67.16 isn’t in the center of the interval. Chi-Square & F Distributions – p. 28/55
TheF Distribution • Comparing two variances: Are they equal? • Start with two independent populations, each
normal and equal variances.. . .
• Draw two independent random samples from each population,
n1 from population 1
n2 from population 2
TheF Distribution (continued)
• Using data from each of the two samples, estimate σ2.
s2 1 and s2
• Both S2 1 and S2
2 are random variables, and their ratio is a random variable,
F = estimate of σ2
/ν2
• Random variable F has an F distribution. Chi-Square & F Distributions – p. 30/55
Testing for Equal Variances • F gives us a way to test Ho : σ2
1 = σ2 2(= σ2).
/ν2
• A random variable formed from the ratio of two independent chi-squared variables, each divided by it’s degrees of freedom, is an “F–ratio” and has an F distribution.
Chi-Square & F Distributions – p. 31/55
Conditions for an F Distribution • IF
• Both parent populations are normal. • Both parent populations have the same
variance. • The samples (and populations) are
independent.
• ν2 = n2 − 1 = denominator degrees of freedom
Chi-Square & F Distributions – p. 32/55
Eg ofF Distributions: F2,ν2
Eg ofF Distributions: F5,ν2
Eg ofF Distributions: F50,nu2. . .
Chi-Square & F Distributions – p. 35/55
Important Properties of F Distributions • The range of F–values is non-negative real
numbers (i.e., 0 to +∞). • They depend on 2 parameters: numerator
degrees of freedom (ν1) and denominator degrees of freedom (ν2).
• The expected value (i.e, the mean) of a random variable with an F distribution with ν2 > 2 is
E(Fν1,ν2 ) = µFν1,ν2
= ν2/(ν2 − 2).
Chi-Square & F Distributions – p. 36/55
Properties ofF Distributions • For any fixed ν1 and ν2, the F distribution is
non-symmetric. • The particular shape of the F distribution
varies considerably with changes in ν1 and ν2. • In most applications of the F distribution (at
least in this class), ν1 < ν2, which means that F is positively skewed.
• When ν2 > 2, the F distribution is uni-modal.
Chi-Square & F Distributions – p. 37/55
Percentiles of theF Dist. • http://calculators.stat.ucla.edu/cdf • p-value program • SAS probf
• Tables textbooks given the upper 25th, 10th,
5th, 2.5th, and 1st percentiles. Usually, the • Columns correspond to ν1, numerator df. • Rows correspond to ν2, denominator df.
• Getting lower percentiles using tables requires taking reciprocals.
Chi-Square & F Distributions – p. 38/55
SelectedF values from Table V Note: all values are for upper α = .05
ν1 ν2 Fν1,ν2 which is also . . .
1 1 161.00 t2 1
1 20 4.35 t2 20
1 1000 3.85 t2 1000
1 ∞ 3.84 t2 ∞
Chi-Square & F Distributions – p. 39/55
Test Equality of Two Variances Are students from private high schools more homogeneous with respect to their math test scores than students from public high schools?
• Statistical Hypotheses: Ho : σ2
public/σ 2 private = 1
public ,(1-tailed test). • Assumptions: Math scores of students from private
schools and public schools are normally distributed and are independent both between and within in school type.
Chi-Square & F Distributions – p. 40/55
Test Equality of Two Variances • Test Statistic:
F = s2 1
67.16 = 1.366
with ν1 = (n1 − 1) = (506 − 1) = 505 and ν2 = (n2 − 1) = (94 − 1) = 93.
Since the sample variance for public schools, s2 1 = 91.74, is larger than the sample variance for
private schools, s2 2 = 67.16, put s2
1 in the numerator.
• Sampling Distribution of Test Statistic is F distribution with ν1 = 505 and ν2 = 93.
Chi-Square & F Distributions – p. 41/55
Test Equality of Two Variances • Decision: Our observed test statistic,
F505,93 = 1.366 has a p–value= .032. Since p–value < α = .05, reject Ho.
• Or, we could compare the observed test statistic, F505,93 = 1.366, with the critical value of F505,93(α = .05) = 1.320. Since the observed value of the test statistic is larger than the critical value, reject Ho.
• Conclusion: The data support the conclusion that students from private schools are more homogeneous with respect to math test scores than students from public schools.
Chi-Square & F Distributions – p. 42/55
Example Continued • Alternative question: “Are the individual
differences of students in public high schools and private high schools the same with respect to their math test scores?”
• Statistical Hypotheses: The null is the same, but the alternative hypothesis would be
Ha : σ2 public 6= σ2
private (a 2–tailed alternative)
• Given α = .05, Retain the Ho, because our obtained p–value (the probability of getting a test statistic as large or larger than what we got) is larger than α/2 = .025.
Chi-Square & F Distributions – p. 43/55
Example Continued
• Given α = .05, Retain the Ho, because our obtained p–value (the probability of getting a test statistic as large or larger than what we got) is larger than α/2 = .025.
• Or the rejection region (critical value) would be any F–statistic greater than F505,93(α = .025) = 1.393.
• Point: This is a case where the choice between a 1 and 2 tailed test leads to different decisions regarding the null hypothesis.
Chi-Square & F Distributions – p. 44/55
Test for Homogeneity of Variances
Ho : σ2 1 = σ2
2 = . . . = σ2 J
• These include • Hartley’s Fmax test • Bartlett’s test • One regarding variances of paired
comparisons. • You should know that they exist; we won’t go
over them in this class. Such tests are not as important as they once (thought) they were.
Chi-Square & F Distributions – p. 45/55
Test for Homogeneity of Variances • Old View: Testing the equality of variances
should be a preliminary to doing independent t-tests (or ANOVA).
• Newer View: • Homogeneity of variance is required for small
samples, which is when tests of homogeneous variances do not work well. With large samples, we don’t have to assume σ2
1 = σ2 2.
Chi-Square & F Distributions – p. 46/55
Test for Homogeneity of Variances • For small or moderate samples and there’s
concern with possible heterogeneity −→ perform a Quasi-t test.
• In an experimental settings where you have control over the number of subjects and their assignment to groups/conditions/etc. −→ equal sample sizes.
• In non-experimental settings where you have similar numbers of participants per group, t test is pretty robust.
Chi-Square & F Distributions – p. 47/55
Relationship betweenz, tν, χ2 ν, andFν1,ν2
. . . and the central importance of the normal distribution.
• Normal, Student’s tν, χ2 ν, and Fν1,ν2
are all theoretical distributions.
• We don’t ever actually take vast (infinite) numbers of samples from populations.
• The distributions are derived based on mathematical logic statements of the form
IF . . . . . . . . . Then . . . . . . . . .
Derivation of Distributions
• Example • IF we draw independent random samples of size
(large) n from a population and compute the mean Y and repeat this process many, many, many, many times,
• THEN Y is approximately normal.
• Assumptions are part of the “if” part, the conditions used to deduce sampling distribution of statistics.
• The t, χ2 and F distributions all depend on normal “parent” population.
Chi-Square & F Distributions – p. 49/55
Chi-Square Distribution
• χ2 ν = sum of independent squared normal
random variables with mean µ = 0 and variance σ2 = 1 (i.e., “standard normal” random variables).
χ2 ν =
z2 i where zi ∼ N (0, 1)
• Based on the Central Limit Theorem, the “limit” of the χ2
ν distribution (i.e., n → ∞) is normal.
Chi-Square & F Distributions – p. 50/55
TheF Distribution • Fν1,ν2
= ratio of two independent chi-squared random variables each divided by their respective degrees of freedom.
Fν1,ν2 =
• The “limiting” distribution of Fν1,ν2 as ν2 → ∞
is χ2 ν1
ν2 /ν2 → 1.
Studentst Distribution Note that
= (Y − µ)2n
( 1 σ2
1 σ2
σ2(n−1)
Studentst Distribution (continued)
t2ν = z2
χ2 ν/ν
χ2 ν/ν
• A squared t random variable equals the ratio of squared standard normal divided by chi-squared divided by its degrees of freedom. So. . .
Chi-Square & F Distributions – p. 53/55
Studentst Distribution (continued)
• As ν → ∞, tν → N (0, 1) because χ2 ν/ν → 1.
• Since z2 = χ2 1,
= F1,ν
• Why are the assumptions of normality, homogeneity of variance, and independence required for • t test for mean(s)
• Testing homogeneity of variance(s). Chi-Square & F Distributions – p. 54/55
Summary of Relationships
Distribution Definition Parent Limiting
Fν1,ν2 (χ2
ν1 /ν1)/(χ
2 ν2
ν1 /ν1
t z/ √
= z2 = χ2 1.
Chi-Square & ${cal F}$ Distributions
Chi-Square & ${cal F}$ Distributions
Chi-Square & ${cal F}$ Distributions
Chi-Square & ${cal F}$ Distributions
The Chi-Square Distributions
Chi-Square Distributions
Percentiles of $chi ^2$ Distributions
SAS Examples & Computations
SAS Examples & Computations
Inferences about $sigma ^2$ {small (continued)}
Test Statistic for $H_o: sigma ^2=sigma ^2_o$
Decision and Conclusion, $H_o: sigma ^2=sigma ^2_o$
Example of $H_o: sigma ^2=sigma ^2_o$
Example of $H_o: sigma ^2=sigma ^2_o$ {small (continued)}
Example of $H_o: sigma ^2=sigma ^2_o$ {small (continued)}
Confidence Interval Estimate of $sigma ^2$
$90%$ Confidence Interval Estimate of $sigma ^2$
The ${cal F}$ Distribution
Testing for Equal Variances
Eg of ${cal F}$ Distributions: ${cal F}_{2, u _2}$
Eg of ${cal F}$ Distributions: ${cal F}_{5, u _2}$
Eg of ${cal F}$ Distributions: ${cal F}_{50,nu_2}$ldots
Important Properties of ${cal F}$ Distributions
Properties of ${cal F}$ Distributions
Percentiles of the ${cal F}$ Dist.
Selected ${cal F}$ values from Table V
Test Equality of Two Variances
Test Equality of Two Variances
Test Equality of Two Variances
Example Continued
Example Continued
Test for Homogeneity of Variances
Test for Homogeneity of Variances
Test for Homogeneity of Variances
Relationship between $z$, $t_{ u }$, $chi ^2_{ u }$, and ${cal F}_{ u _1, u _2}$
Derivation of Distributions
Summary of Relationships