Statistical Hypothesis Testing with SAS and R (Taeger/Statistical Hypothesis Testing with SAS and R)...

2
Glossary The significance level of a statistical test. Φ(x) Distribution function of the standard normal distribution: Φ(x)= 1 2 x −∞ e t 2 2 t. 2 ;n The -quantile of the 2 -distribution with n degrees of freedom (Table B.3 and Table B.4). t ;n The -quantile of the t-distribution with n degrees of freedom (Table B.2). z The -quantile of the standard normal distribution (Table B.1): z 1 (). |x| The absolute value of x. x The sample mean of a sample x 1 , , x n : x = 1 n n i=1 x i . Continuity correction A continuity correction is often applied when approximating the cumulative probability function P(X x) of a discrete random variable by the standard normal distribution function. Usually a correction factor of 0.5 is used such that P(X x)≈Φ ( xE(X)+0.5 Var(X) ) . Empirical distribution function (EDF) Let x (1) , , x (n) be a descending ordered sample, then the EDF is defined as: F(x)= 0 for all x < x (1) kn for x i x < x (i+1) , k = 1, , n 1 1 for all x x (n). F n 1 ; n 2 Distribution function of the F-distribution with n 1 and n 2 degrees of freedom. Statistical Hypothesis Testing with SAS and R, First Edition. Dirk Taeger and Sonja Kuhnt. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

Transcript of Statistical Hypothesis Testing with SAS and R (Taeger/Statistical Hypothesis Testing with SAS and R)...

Page 1: Statistical Hypothesis Testing with SAS and R (Taeger/Statistical Hypothesis Testing with SAS and R) || Glossary

Glossary

𝛼 The significance level 𝛼 of a statistical test.Φ(x) Distribution function of the standard normal distribution:

Φ(x) = 1√2𝜋

x

∫−∞

e−t2∕2𝜕t.

𝜒2𝛼;n The 𝛼-quantile of the 𝜒2-distribution with n degrees of freedom

(Table B.3 and Table B.4).t𝛼;n The 𝛼-quantile of the t-distribution with n degrees of freedom

(Table B.2).z𝛼 The 𝛼-quantile of the standard normal distribution (Table B.1):

z𝛼 = Φ−1(𝛼).|x| The absolute value of x.

x The sample mean of a sample x1, … , xn: x =1

n

n∑i=1xi.

Continuitycorrection A continuity correction is often applied when approximating the

cumulative probability function P(X ≥ x) of a discrete random variableby the standard normal distribution function. Usually a correction factor

of 0.5 is used such that P(X ≥ x) ≈ Φ(x−E(X)+0.5√

Var(X)

).

Empiricaldistributionfunction (EDF) Let x(1), … , x(n) be a descending ordered sample, then the EDF is

defined as:

F(x) =⎧⎪⎨⎪⎩0 for all x < x(1)k∕n for xi ≤ x < x(i+1), k = 1, … , n − 1

1 for all x ≥ x(n).

Fn1; n2 Distribution function of the F-distribution with n1 and n2 degrees offreedom.

Statistical Hypothesis Testing with SAS and R, First Edition. Dirk Taeger and Sonja Kuhnt.© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

Page 2: Statistical Hypothesis Testing with SAS and R (Taeger/Statistical Hypothesis Testing with SAS and R) || Glossary

GLOSSARY 285

f 𝛼;n1,n2 The 𝛼-quantile of the F-distribution with n1 and n2 degrees of freedom(Tables B.5–B.7).

H0 The null hypothesis of a test problem.H1 The alternative hypothesis of a test problem.n The sample size of a sample x1, … , xn.Ranks Let x1, … , xn be a sample. The ordered sample (from the lowest to the

highest value) is x(1), … , x(n). Then j of x(j) is the rank of thecorresponding value xj. For example, let 4, 5, 2, 9, 3 be a sample of size5, then the ordered sample is: 2, 3, 4, 5, 9. The rank of the sample value 2is 1, the rank of sample value 3 is 2, and the rank of sample value 9 is 5.

Run Let n1 observations of random variable X1 and n2 observations ofrandom variable X2 be given. Assume that both samples are combinedand (if at least ordinal) are arranged in increasing or time of occurrenceorder. A run is a group of successive observations generated from thesame random variable. The same idea can be applied if the observationsare coming from a binary random variable. For example, a coin is tossed10 times; the result of these tosses are either (H)eads or (T)ails. Theobserved sequence is: HH T HH TTT H. This sequence has five runs,namely HH, T , HH, TTT , H.

Mid ranks This is a way of dealing with tied values, which are identical values inan ordered sequence. The same rank is assigned to these values, namelythe mean of their ranks. For example, let 4, 2, 4, 4, 5 be a sample. It isunclear if the observations 1, 3, or 4 will get the ranks 2, 3, or 4. Thearithmetic mean of the ranks of the tied values is (2 + 3 + 4)∕3 = 3, soeach value 4 will get the mid rank 3. The rank vector is (1, 3, 3, 3, 5)while the sum of ranks is still 15.

p-value The probability of observing a sample as discrepant with the nullhypothesis H0 as the observed sample under the null hypothesis.

Ties If one or more observations in a sample have the same value they arecalled tied values.

11A{x} Characteristic function: 11A{x} =⎧⎪⎨⎪⎩1 x ∈ A

if

0 x ∉ A