Introductory Econometrics - Session 3 - Distribution … · DistributionsBasics of mathematical...

Post on 30-Jul-2018

235 views 0 download

Transcript of Introductory Econometrics - Session 3 - Distribution … · DistributionsBasics of mathematical...

Distributions Basics of mathematical stats Confidence intervals

Introductory EconometricsSession 3 - Distribution and confidence intervals

Roland Rathelot

Sciences Po

July 2011

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The Normal distribution

I If X is a r.v. normally distributed, X ∼ N(µ, σ2),

fX (x) =1

σ√

2πexp

[−(x − µ)2

2σ2

]I The pdf of the standard normal distribution is:

φ(z) =1√2π

exp[−z2/2

]

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Some properties of the Normal distribution

I If X ∼ N(µ, σ2) then (X − µ)/σ ∼ N(0, 1)

I If X ∼ N(µ, σ2) then aX + b ∼ N(aµ+ b, a2σ2)

I Any linear combination of independent, identically distributednormal rv has a normal distribution

I X and Y are jointly normally distributed. In this case, if Xand Y are uncorrelated then they are independent

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Some properties of the Normal distribution

I If X ∼ N(µ, σ2) then (X − µ)/σ ∼ N(0, 1)

I If X ∼ N(µ, σ2) then aX + b ∼ N(aµ+ b, a2σ2)

I Any linear combination of independent, identically distributednormal rv has a normal distribution

I X and Y are jointly normally distributed. In this case, if Xand Y are uncorrelated then they are independent

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Some properties of the Normal distribution

I If X ∼ N(µ, σ2) then (X − µ)/σ ∼ N(0, 1)

I If X ∼ N(µ, σ2) then aX + b ∼ N(aµ+ b, a2σ2)

I Any linear combination of independent, identically distributednormal rv has a normal distribution

I X and Y are jointly normally distributed. In this case, if Xand Y are uncorrelated then they are independent

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Some properties of the Normal distribution

I If X ∼ N(µ, σ2) then (X − µ)/σ ∼ N(0, 1)

I If X ∼ N(µ, σ2) then aX + b ∼ N(aµ+ b, a2σ2)

I Any linear combination of independent, identically distributednormal rv has a normal distribution

I X and Y are jointly normally distributed. In this case, if Xand Y are uncorrelated then they are independent

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The chi-square distribution

I Let Zi , i = 1 . . .Q be Q independent rv distributed asstandard normal

I Then, the sum of the squares of the Zi , X , is known to have achi-square distribution with Q degrees of freedom (df)

X =Q∑i=1

Z 2i ∼ χ2

q

I Expectation and variance

E[X ] = Q;V [X ] = 2Q

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The chi-square distribution

I Let Zi , i = 1 . . .Q be Q independent rv distributed asstandard normal

I Then, the sum of the squares of the Zi , X , is known to have achi-square distribution with Q degrees of freedom (df)

X =Q∑i=1

Z 2i ∼ χ2

q

I Expectation and variance

E[X ] = Q;V [X ] = 2Q

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The t distribution

I Let Z have a standard normal distribution

I Let X have a chi-square distribution with Q degrees offreedom

I Assume Z and X are independent

I Then, T defined as

T =Z√X/Q

has a t distribution with Q df

T ∼ tQ

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The t and the Normal

I An approximate statement: The t is just a normal withthicker tails

I The pdf of the t and the normal look close: just the thicknessof the tails differ

I The lower the df of the t, the thicker the tails

I When the df tends to infinity, the t tends to a normal

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The F distribution

I Let X1 have a chi-square distribution with Q1 degrees offreedom

I Let X2 have a chi-square distribution with Q2 degrees offreedom

I Assume X1 and X2 are independent

I Then, F defined as

F =X1/Q1

X2/Q2

has a F distribution with (Q1,Q2) df

F ∼ FQ1,Q2

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Population and sample

I Population: set of individuals, the population is defined bysome criteria; it is assumed infinite in the statisticalframework: rv Y

I Sample: drawn in the population, the sample is of finite size:rv {Y1 . . .Yn}

I Data sample: realizations of the previous rv in one particulardataset: {y1 . . . yn}

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Random sample

I In basic statistical applications, the sample is often assumedto have been randomly drawn from the population

I If Y is a rv defined on the population from the density f ,{Y1 . . .Yn} is a random sample if these rv are independent,identically distributed from f

I The random nature of {Y1 . . .Yn}

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Parametric estimation

I Let Y be a rv representing the population with a pdf f (y , θ)

I The pdf of Y is assumed to be known except for theparameter θ

I +: only have to estimate θ in order to know the pdf

I −: sometimes, the assumption about the functional form ofthe pdf might be wrong

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Estimators

I Given a sample {Y1 . . .Yn}, an estimator of θ is a function ofthe rv {Y1 . . .Yn}, that aims at measuring θ

I For instance, a natural estimator of E (Y ) is:

Y =1

n

n∑i=1

Yi

I Note that the estimator is a function of rv and is therefore arv itself

I For actual data outcomes, the estimate will be y = 1/n∑

i yi

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Unbiasedness

I As any rv, an estimator has an expectation

I If this expectation is equal to the true value of the estimator,then the estimator is unbiased

I For instance, Y is unbiased as E (Y ) = E (Y )

I The bias of an estimator θ of θ is equal to E (θ)− θ

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Unbiasedness

I As any rv, an estimator has an expectation

I If this expectation is equal to the true value of the estimator,then the estimator is unbiased

I For instance, Y is unbiased as E (Y ) = E (Y )

I The bias of an estimator θ of θ is equal to E (θ)− θ

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The estimator of the variance

I The usual estimator of the variance of Y is

S2 =1

n − 1

n∑i=1

(Yi − Y )2

I It can be shown that S2 is unbiased

I Why dividing by n− 1 and not n? Because Y is also estimated

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The variance of the estimator

I What is a good estimator? Certainly an unbiased one

I Is it enough? An example

I Another criterion could be precision

I Need to compute the variance of the estimator

I Ex: Compute the variance of Y

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

The variance of the estimator

I What is a good estimator? Certainly an unbiased one

I Is it enough? An example

I Another criterion could be precision

I Need to compute the variance of the estimator

I Ex: Compute the variance of Y

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Efficiency

I If θ and θ are two estimators of θ, it is better to chooser themore precise one

I If Var(θ) ≤ Var(θ) then θ is efficient relative to θ

I If you consider biased estimators, comparing variance is notenough

I One criterion (among others) is the mean square error (MSE)

MSE (θ) = E [(θ − θ)2] = Var(θ) + [Bias(θ)]2

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Efficiency

I If θ and θ are two estimators of θ, it is better to chooser themore precise one

I If Var(θ) ≤ Var(θ) then θ is efficient relative to θ

I If you consider biased estimators, comparing variance is notenough

I One criterion (among others) is the mean square error (MSE)

MSE (θ) = E [(θ − θ)2] = Var(θ) + [Bias(θ)]2

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Consistency

I Another issue with an estimator: how does it evolve when thesample size gets larger?

I An estimator is said to be consistent if it gets closer to thetrue value when the sample size increases

I Formally, θ is a consistent estimator of θ if, for every ε > 0,

P(|θ − θ| > ε)→ 0 as n→∞

I Is said that the probability limit of θ is θ: plim(θ) = θ

I Otherwise, it is said to be inconsistent

I If unbiased and shrinking variance then consistency

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Law of large numbers

I Yn = 1/n∑

i Yi is consistent for E (Y )

I This is the law of large numbers

plim(Yn) = E (Y )

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Asymptotic normality

I In words, asymptotic normality occurs when a rv gets closer tothe normal distribution as the sample size increases

I Let Zi be a sequence of rv, i = 1, 2, . . . such that for allnumbers z ,

P(Zi ≤ z)→ Φ(z) as i →∞

where Φ(.) is the standard normal cdf

I In this case, Zi is said to have an asymptotic normaldistribution

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Asymptotic normality

I In words, asymptotic normality occurs when a rv gets closer tothe normal distribution as the sample size increases

I Let Zi be a sequence of rv, i = 1, 2, . . . such that for allnumbers z ,

P(Zi ≤ z)→ Φ(z) as i →∞

where Φ(.) is the standard normal cdf

I In this case, Zi is said to have an asymptotic normaldistribution

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Central limit theorem

I Let {Y1, . . .Yn} be a random sample of expectation µ andvariance σ2

I The central limit theorem states that

Zn =Yn − µσ/√n

has an asymptotic standard normal distribution

I Zn is the standardized version of Yn

I Note: no assumption about the distribution of the Yi

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Why confidence intervals?

I So far, point estimation

I But provides no information about how close the estimate isgoing to be to the population parameter

I Precision is given by the standard deviation of the estimator

I But the standard deviation alone makes no direct statementabout where the population parameter is likely to lie inrelation to the point estimate

I This issue is overcome by confidence intervals

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Confidence interval

I In words, the confidence interval is a region in which thepopulation parameter has x% to be

I More formally, take a number α ∈ (0, 1), the confidenceinterval C1−α,θ of θ, with a confidence level 1−α, is such that

P(C1−α,θ 3 θ) = 1− α

I Building confidence intervals requires not only the pointestimate, the standard deviation of the estimator, but alsoknowing the distribution of the estimator

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Confidence interval for the sample mean

I Assume Y ∼ N(µ, 1) so that Y has a normal distribution ofexpectation µ and variance 1/n

I The objective here is to estimate µ

I It will then be true that

P(−2 <

Y − µ1/√n< 2

)= .95

I An 95% interval estimate of the confidence interval of µ is:

[y − 2/√n, y + 2/

√n]

Rathelot

Introductory Econometrics

Distributions Basics of mathematical stats Confidence intervals

Interpretation

I Warning, the interval estimator [Y − 2/√n, Y + 2/

√n] is a

random interval, while µ is a fixed and deterministic number(yet unknown)

I The interpretation is thus that if you drew, say, 1000 samples,the confidence interval would include the true value µ inabout 950 cases

I If you choose a lower α (the probability to be deceived), thenthe confidence interval is going to be wider

Rathelot

Introductory Econometrics