Lecture 2. Estimation, bias, and mean squared error

45
0. Lecture 2. Estimation, bias, and mean squared error Lecture 2. Estimation, bias, and mean squared error 1 (1–1)

Transcript of Lecture 2. Estimation, bias, and mean squared error

Page 1: Lecture 2. Estimation, bias, and mean squared error

0.

Lecture 2. Estimation, bias, and mean squared error

Lecture 2. Estimation, bias, and mean squared error 1 (1–1)

Page 2: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 3: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 4: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 5: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 6: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 7: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 8: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 9: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.1. Estimators

Estimators

Suppose that X1, . . . ,Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.

We aim to estimate θ by a statistic, ie by a function T of the data.

If X = x = (x1, . . . , xn) then our estimate is θ̂ = T (x) (does not involve θ).

Then T (X) is our estimator of θ, and is a rv since it inherits randomfluctuations from those of X.

The distribution of T = T (X) is called its sampling distribution.

Example

Let X1, . . . ,Xn be iid N(µ, 1).

A possible estimator for µ is T (X) = 1n

∑Xi .

For any particular observed sample x, our estimate is T (x) = 1n

∑xi .

We have T (X) ∼ N(µ, 1/n). �

Lecture 2. Estimation, bias, and mean squared error 2 (1–1)

Page 10: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.2. Bias

Bias

If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between itsexpectation and the ’true’ value: i.e.

bias(θ̂) = Eθ(θ̂)− θ.

An estimator T (X) is unbiased for θ if EθT (X) = θ for all θ, otherwise it isbiased.

In the above example, Eµ(T ) = µ so T is unbiased for µ.

[Notation note: when a parameter subscript is used with an expectation orvariance, it refers to the parameter that is being conditioned on. i.e. theexpectation or variance will be a function of the subscript]

Lecture 2. Estimation, bias, and mean squared error 3 (1–1)

Page 11: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.2. Bias

Bias

If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between itsexpectation and the ’true’ value: i.e.

bias(θ̂) = Eθ(θ̂)− θ.

An estimator T (X) is unbiased for θ if EθT (X) = θ for all θ, otherwise it isbiased.

In the above example, Eµ(T ) = µ so T is unbiased for µ.

[Notation note: when a parameter subscript is used with an expectation orvariance, it refers to the parameter that is being conditioned on. i.e. theexpectation or variance will be a function of the subscript]

Lecture 2. Estimation, bias, and mean squared error 3 (1–1)

Page 12: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.2. Bias

Bias

If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between itsexpectation and the ’true’ value: i.e.

bias(θ̂) = Eθ(θ̂)− θ.

An estimator T (X) is unbiased for θ if EθT (X) = θ for all θ, otherwise it isbiased.

In the above example, Eµ(T ) = µ so T is unbiased for µ.

[Notation note: when a parameter subscript is used with an expectation orvariance, it refers to the parameter that is being conditioned on. i.e. theexpectation or variance will be a function of the subscript]

Lecture 2. Estimation, bias, and mean squared error 3 (1–1)

Page 13: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.2. Bias

Bias

If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between itsexpectation and the ’true’ value: i.e.

bias(θ̂) = Eθ(θ̂)− θ.

An estimator T (X) is unbiased for θ if EθT (X) = θ for all θ, otherwise it isbiased.

In the above example, Eµ(T ) = µ so T is unbiased for µ.

[Notation note: when a parameter subscript is used with an expectation orvariance, it refers to the parameter that is being conditioned on. i.e. theexpectation or variance will be a function of the subscript]

Lecture 2. Estimation, bias, and mean squared error 3 (1–1)

Page 14: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 15: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 16: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 17: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 18: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]=

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 19: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]

= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 20: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]

= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 21: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 22: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 23: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.3. Mean squared error

Mean squared error

Recall that an estimator T is a function of the data, and hence is a randomquantity. Roughly, we prefer estimators whose sampling distributions “clustermore closely” around the true value of θ, whatever that value might be.

Definition 2.1

The mean squared error (mse) of an estimator θ̂ is Eθ

[(θ̂ − θ)2

].

For an unbiased estimator, the mse is just the variance.

In general

[(θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2

]= Eθ

[(θ̂ − Eθ θ̂)2

]+[Eθ(θ̂)− θ

]2+ 2[Eθ(θ̂)− θ

]Eθ

[θ̂ − Eθ θ̂

]= varθ(θ̂) + bias2(θ̂),

where bias(θ̂) = Eθ(θ̂)− θ.

[NB: sometimes it can be preferable to have a biased estimator with a lowvariance - this is sometimes known as the ’bias-variance tradeoff’.]

Lecture 2. Estimation, bias, and mean squared error 4 (1–1)

Page 24: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 25: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 26: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 27: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 28: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 29: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 30: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 31: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 32: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 33: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2

Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 34: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

Example: Alternative estimators for Binomial mean

Suppose X ∼ Binomial(n, θ), and we want to estimate θ.

The standard estimator is TU = X/n, which is Unbiassed sinceEθ(TU) = nθ/n = θ.

TU has variance varθ(TU) = varθ(X )/n2 = θ(1− θ)/n.

Consider an alternative estimator TB = X+1n+2 = w X

n + (1− w) 12 , where

w = n/(n + 2).

(Note: TB is a weighted average of X/n and 12 . )

e.g. if X is 8 successes out of 10 trials, we would estimate the underlyingsuccess probability as T (8) = 9/12 = 0.75, rather than 0.8.

Then Eθ(TB)− θ = nθ+1n+2 − θ = (1− w)

(12 − θ

), and so it is biased.

varθ(TB) = varθ(X )(n+2)2 = w2θ(1− θ)/n.

Now mse(TU) = varθ(TU) + bias2(TU) = θ(1− θ)/n.

mse(TB) = varθ(TB) + bias2(TB) = w2θ(1− θ)/n + (1− w)2(12 − θ

)2Lecture 2. Estimation, bias, and mean squared error 5 (1–1)

Page 35: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

Lecture 2. Estimation, bias, and mean squared error 6 (1–1)

Page 36: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

Lecture 2. Estimation, bias, and mean squared error 6 (1–1)

Page 37: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

Lecture 2. Estimation, bias, and mean squared error 6 (1–1)

Page 38: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

Lecture 2. Estimation, bias, and mean squared error 6 (1–1)

Page 39: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean

So the biased estimator has smaller MSE in much of the range of θ

TB may be preferable if we do not think θ is near 0 or 1.

So our prior judgement about θ might affect our choice of estimator.

Will see more of this when we come to Bayesian methods,.

Lecture 2. Estimation, bias, and mean squared error 6 (1–1)

Page 40: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)

Page 41: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)

Page 42: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)

Page 43: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)

Page 44: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)

Page 45: Lecture 2. Estimation, bias, and mean squared error

2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great

Why unbiasedness is not necessarily so great

Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the

moment), you want to estimate θ = [P(X = 0)]2 = e−2λ.

Then any unbiassed estimator T (X ) must satisfy Eθ(T (X )) = θ, or equivalently

Eλ(T (X )) = e−λ∞∑x=0

T (x)λx

x!= e−2λ.

The only function T that can satisfy this equation is T (X ) = (−1)X [coefficientsof polynomial must match].

Thus the only unbiassed estimator estimates e−2λ to be 1 if X is even, -1 if X isodd.

This is not sensible.

Lecture 2. Estimation, bias, and mean squared error 7 (1–1)