Lecture 2. Estimation, bias, and mean squared error

Lecture 2. Estimation, bias, and mean squared error 1 (1–1)

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Estimators

Example

Let X1, . . . ,Xn be iid N(µ, 1).

∑Xi .

∑xi .

We have T (X) ∼ N(µ, 1/n). �

2. Estimation and bias 2.2. Bias

If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between itsexpectation and the ’true’ value: i.e.

bias(θ̂) = Eθ(θ̂)− θ.

An estimator T (X) is unbiased for θ if EθT (X) = θ for all θ, otherwise it isbiased.

In the above example, Eµ(T ) = µ so T is unbiased for µ.

[Notation note: when a parameter subscript is used with an expectation orvariance, it refers to the parameter that is being conditioned on. i.e. theexpectation or variance will be a function of the subscript]

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

= varθ(θ̂) + bias2(θ̂),

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

Mean squared error

Definition 2.1

[(θ̂ − θ)2

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

[θ̂ − Eθ θ̂

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

n + (1− w) 12 , where

w = n/(n + 2).

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

)2Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

Lecture 2. Estimation, bias, and mean squared error

Documents

Transcript of Lecture 2. Estimation, bias, and mean squared error

ELEC 405/511 Error Control Coding

Effect sizes - Sonoma State University · 4/17/2014 1 Effect sizes • r, R2 (adjusted R‐squared), η2 (partial η2), Cohen’s D • Eta‐squared and Cohen's d provide two different

Mean squared error minimization for inverse moment problems · 2012-10-14 · 2Univ. de Toulouse, LAAS, F-31400 Toulouse, France. 3Faculty of Electrical Engineering, Czech Technical

Chi Squared Test

Gravity as gauge theory squared: from amplitudes to black ...gravitation.web.ua.pt › bhw7 › sites › gravitation.web... · Gravity as gauge theory squared: from amplitudes to

Error Accuracy & Precisionocw.snu.ac.kr/sites/default/files/NOTE/2019-XSA-07-error... · 2019. 9. 5. · Accuracy –approach to the “true” value High accuracy High precision

Chi-Squared Distribution - Semantic Scholar...Chi-Squared Distribution Michael Manser, Subhiskha Swamy, James Blanchard Econ 7818 HW 5 1 What is it? The Chi-Squared distribution is

Text Indexing and Dictionary Matching with One Error

Chapter 10 FIR Digital Filter Design. §10.2.1 Least Integral-Squared Error Design of FIR Filters Let H d (e jω ) denote the desired frequency response.

Error < < + Error

E-SQUARED Magazine - Issue 3

Lecture 13: Measurement Error and Simultaneity

Gaussians Linear Regression Bias-Variance TradeoffLinear Regression Bias-Variance Tradeoff Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University January 22nd,

Peri Bias Hannah Arendt

Porisma Ode Bias

Tutorial 4: Standard Deviations for Prior Error Correlations - Roms

Minimum Bias Low Pt Tracking

Lecture 3: Backward error analysis - UNIGEhairer/poly_geoint/week3.pdfA forward error analysis consists of the study of the errors y1 −ϕh(y0)(local error) and yn − ϕnh(y0) (global

6 - Propagation_errors_UCh - Examples of Error Propagation

Nautical Calculations ( Gyro Error Calculation)