Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the...

44
Part II Principles of Statistics Year 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005

Transcript of Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the...

Page 1: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

Part II

—Principles of Statistics

Year

201920182017201620152014201320122011201020092008200720062005

Page 2: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

78

Paper 4, Section II

28J Principles of StatisticsWe consider a statistical model {f(·, θ) : θ ∈ Θ}.(a) Define the maximum likelihood estimator (MLE) and the Fisher information

I(θ).

(b) Let Θ = R and assume there exist a continuous one-to-one function µ : R → Rand a real-valued function h such that

Eθ[h(X)] = µ(θ) ∀θ ∈ R.

(i) For X1, . . . ,Xn i.i.d. from the model for some θ0 ∈ R, give the limit in almostsure sense of

µn =1

n

n∑

i=1

h(Xi) .

Give a consistent estimator θn of θ0 in terms of µn.

(ii) Assume further that Eθ0 [h(X)2] <∞ and that µ is continuously differentiableand strictly monotone. What is the limit in distribution of

√n(θn − θ0)? As-

sume too that the statistical model satisfies the usual regularity assumptions.Do you necessarily expect Var(θn) > (nI(θ0))

−1 for all n? Why?

(iii) Propose an alternative estimator for θ0 with smaller bias than θn if Bn(θ0) =Eθ0 [θn]− θ0 =

an + b

n2 +O( 1n3 ) for some a, b ∈ R with a 6= 0.

(iv) Further to all the assumptions in (ii), assume that the MLE for θ0 is of theform

θMLE =1

n

n∑

i=1

h(Xi).

What is the link between the Fisher information at θ0 and the variance ofh(X)? What does this mean in terms of the precision of the estimator andwhy?

[You may use results from the course, provided you state them clearly.]

Part II, 2019 List of Questions

2019

Page 3: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

79

Paper 3, Section II

28J Principles of StatisticsWe consider the exponential model {f(·, θ) : θ ∈ (0,∞)}, where

f(x, θ) = θe−θx for x > 0 .

We observe an i.i.d. sample X1, . . . ,Xn from the model.

(a) Compute the maximum likelihood estimator θMLE for θ. What is the limit indistribution of

√n(θMLE − θ)?

(b) Consider the Bayesian setting. Fix α, β > 0 and place a Gamma(α, β) prior forθ with density

π(θ) =βα

Γ(α)θα−1 exp(−βθ) for θ > 0 ,

where Γ is the Gamma function satisfying Γ(a + 1) = aΓ(a) for all a > 0. What is theposterior distribution for θ? What is the Bayes estimator θπ for the squared loss?

(c) Show that the Bayes estimator is consistent. What is the limiting distributionof

√n(θπ − θ)?

[You may use results from the course, provided you state them clearly.]

Part II, 2019 List of Questions [TURN OVER

2019

Page 4: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 2, Section II

28J Principles of Statistics(a) We consider the model {Poisson(θ) : θ ∈ (0,∞)} and an i.i.d. sample

X1, . . . ,Xn from it. Compute the expectation and variance of X1 and check they areequal. Find the maximum likelihood estimator θMLE for θ and, using its form, derive thelimit in distribution of

√n(θMLE − θ).

(b) In practice, Poisson-looking data show overdispersion, i.e., the sample varianceis larger than the sample expectation. For π ∈ [0, 1] and λ ∈ (0,∞), let pπ,λ : N0 → [0, 1],

k 7→ pπ,λ(k) =

πe−λ λk

k! for k > 1

(1− π) + πe−λ for k = 0.

Show that this defines a distribution. Does it model overdispersion? Justify your answer.

(c) Let Y1, . . . , Yn be an i.i.d. sample from pπ,λ. Assume λ is known. Find themaximum likelihood estimator πMLE for π.

(d) Furthermore, assume that, for any π ∈ [0, 1],√n(πMLE − π) converges in

distribution to a random variable Z as n → ∞. Suppose we wanted to test the nullhypothesis that our data arises from the model in part (a). Before making any furthercomputations, can we necessarily expect Z to follow a normal distribution under the nullhypothesis? Explain. Check your answer by computing the appropriate distribution.

[You may use results from the course, provided you state them clearly. In thisquestion, N0 denotes the non-negative integers.]

Part II, 2019 List of Questions

2019

Page 5: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

81

Paper 1, Section II

29J Principles of StatisticsIn a regression problem, for a given X ∈ Rn×p fixed, we observe Y ∈ Rn such that

Y = Xθ0 + ε

for an unknown θ0 ∈ Rp and ε random such that ε ∼ N (0, σ2In) for some known σ2 > 0.

(a) When p 6 n and X has rank p, compute the maximum likelihood estimatorθMLE for θ0. When p > n, what issue is there with the likelihood maximisation approachand how many maximisers of the likelihood are there (if any)?

(b) For any λ > 0 fixed, we consider θλ minimising

‖Y −Xθ‖22 + λ‖θ‖22

over Rp. Derive an expression for θλ and show it is well defined, i.e., there is a uniqueminimiser for every X,Y and λ.

Assume p 6 n and that X has rank p. Let Σ = X⊤X and note that Σ = V ΛV ⊤

for some orthogonal matrix V and some diagonal matrix Λ whose diagonal entries satisfyΛ1,1 > Λ2,2 > . . . > Λp,p. Assume that the columns of X have mean zero.

(c) Denote the columns of U = XV by u1, . . . , up. Show that they are sampleprincipal components, i.e., that their pairwise sample correlations are zero and that theyhave sample variances Λ1,1, . . . ,Λp,p, respectively. [Hint: the sample covariance betweenui and uj is n−1u⊤i uj.]

(d) Let YMLE = XθMLE. Show that

YMLE = UΛ−1U⊤Y.

Conclude that YMLE is the closest point to Y within the subspace spanned by thenormalised sample principal components of part (c).

(e) Let Yλ = Xθλ. Show that

Yλ = U(Λ + λIp)−1U⊤Y.

Assume Λ1,1,Λ2,2, . . . ,Λq,q ≫ λ ≫ Λq+1,q+1, . . . ,Λp,p for some 1 6 q < p. Concludethat Yλ is approximately the closest point to Y within the subspace spanned by the qnormalised sample principal components of part (c) with the greatest variance.

Part II, 2019 List of Questions [TURN OVER

2019

Page 6: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

81

Paper 4, Section II

28K Principles of StatisticsLet g : R → R be an unknown function, twice continuously differentiable with

|g′′(x)| 6 M for all x ∈ R. For some x0 ∈ R, we know the value g(x0) and we wishto estimate its derivative g′(x0). To do so, we have access to a pseudo-random numbergenerator that gives U∗1 , . . . , U

∗N i.i.d. uniform over [0, 1], and a machine that takes input

x1, . . . , xN ∈ R and returns g(xi) + εi, where the εi are i.i.d. N (0, σ2).

(a) Explain how this setup allows us to generate N independent Xi = x0 + hZi,where the Zi take value 1 or −1 with probability 1/2, for any h > 0.

(b) We denote by Yi the output g(Xi) + εi. Show that for some independent ξi ∈ R

Yi − g(x0) = hZi g′(x0) +

h2

2g′′(ξi) + εi .

(c) Using the intuition given by the least-squares estimator, justify the use of theestimator gN given by

gN =1

N

N∑

i=1

Zi(Yi − g(x0))

h.

(d) Show that

E[|gN − g′(x0)|2] 6h2M2

4+

σ2

Nh2.

Show that for some choice hN of parameter h, this implies

E[|gN − g′(x0)|2] 6σM√N.

Part II, 2018 List of Questions [TURN OVER

2018

Page 7: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

82

Paper 3, Section II

28K Principles of StatisticsIn the model {N (θ, Ip), θ ∈ Rp} of a Gaussian distribution in dimension p, with

unknown mean θ and known identity covariance matrix Ip, we estimate θ based on asample of i.i.d. observations X1, . . . ,Xn drawn from N (θ0, Ip).

(a) Define the Fisher information I(θ0), and compute it in this model.

(b) We recall that the observed Fisher information in(θ) is given by

in(θ) =1

n

n∑

i=1

∇θ log f(Xi, θ)∇θ log f(Xi, θ)⊤ .

Find the limit of in = in(θMLE), where θMLE is the maximum likelihood estimator of θin this model.

(c) Define the Wald statistic Wn(θ) and compute it. Give the limiting distributionof Wn(θ0) and explain how it can be used to design a confidence interval for θ0.

[You may use results from the course provided that you state them clearly.]

Paper 2, Section II

28K Principles of StatisticsWe consider the model {N (θ, Ip), θ ∈ Rp} of a Gaussian distribution in dimension

p > 3, with unknown mean θ and known identity covariance matrix Ip. We estimate θbased on one observation X ∼ N (θ, Ip), under the loss function

ℓ(θ, δ) = ‖θ − δ‖22 .

(a) Define the risk of an estimator θ. Compute the maximum likelihood estimatorθMLE of θ and its risk for any θ ∈ Rp.

(b) Define what an admissible estimator is. Is θMLE admissible?

(c) For any c > 0, let πc(θ) be the prior N (0, c2Ip). Find a Bayes optimal estimator

θc under this prior with the quadratic loss, and compute its Bayes risk.

(d) Show that θMLE is minimax.

[You may use results from the course provided that you state them clearly.]

Part II, 2018 List of Questions

2018

Page 8: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

83

Paper 1, Section II

29K Principles of StatisticsA scientist wishes to estimate the proportion θ ∈ (0, 1) of presence of a gene in

a population of flies of size n. Every fly receives a chromosome from each of its twoparents, each carrying the gene A with probability (1− θ) or the gene B with probabilityθ, independently. The scientist can observe if each fly has two copies of the gene A (denotedby AA), two copies of the gene B (denoted by BB) or one of each (denoted by AB). Welet nAA, nBB, and nAB denote the number of each observation among the n flies.

(a) Give the probability of each observation as a function of θ, denoted by f(X, θ),for all three values X = AA, BB, or AB.

(b) For a vector w = (wAA, wBB, wAB), we let θw denote the estimator defined by

θw = wAAnAA

n+ wBB

nBB

n+ wAB

nAB

n.

Find the unique vector w∗ such that θw∗ is unbiased. Show that θw∗ is a consistentestimator of θ.

(c) Compute the maximum likelihood estimator of θ in this model, denoted byθMLE. Find the limiting distribution of

√n(θMLE − θ). [You may use results from the

course, provided that you state them clearly.]

Part II, 2018 List of Questions [TURN OVER

2018

Page 9: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

84

Paper 2, Section II

26K Principles of StatisticsWe consider the problem of estimating θ in the model {f(x, θ) : θ ∈ (0,∞)}, where

f(x, θ) = (1− α)(x− θ)−α1{x ∈ [θ, θ + 1]} .

Here 1{A} is the indicator of the set A, and α ∈ (0, 1) is known. This estimation is basedon a sample of n i.i.d. X1, . . . ,Xn, and we denote by X(1) < . . . < X(n) the ordered sample.

(a) Compute the mean and the variance of X1. Construct an unbiased estimator of θtaking the form θn = Xn + c(α), where Xn = n−1

∑ni=1Xi, specifying c(α).

(b) Show that θn is consistent and find the limit in distribution of√n(θn − θ). Justify

your answer, citing theorems that you use.

(c) Find the maximum likelihood estimator θn of θ. Compute P(θn − θ > t) for all realt. Is θn unbiased?

(d) For t > 0, show that P(nβ(θn − θ) > t) has a limit in (0, 1) for some β > 0. Giveexplicitly the value of β and the limit. Why should one favour using θn over θn?

Part II, 2017 List of Questions

2017

Page 10: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

85

Paper 3, Section II

26K Principles of StatisticsWe consider the problem of estimating an unknown θ0 in a statistical model

{f(x, θ) , θ ∈ Θ} where Θ ⊂ R, based on n i.i.d. observations X1, . . . ,Xn whosedistribution has p.d.f. f(x, θ0).

In all the parts below you may assume that the model satisfies necessary regularityconditions.

(a) Define the score function Sn of θ. Prove that Sn(θ0) has mean 0.

(b) Define the Fisher Information I(θ). Show that it can also be expressed as

I(θ) = −Eθ

[ d2dθ2

log f(X1, θ)].

(c) Define the maximum likelihood estimator θn of θ. Give without proof the limits ofθn and of

√n(θn − θ0) (in a manner which you should specify). [Be as precise as

possible when describing a distribution.]

(d) Let ψ : Θ → R be a continuously differentiable function, and θn another estimatorof θ0 such that |θn − θn| 6 1/n with probability 1. Give the limits of ψ(θn) and of√n(ψ(θn)− ψ(θ0)) (in a manner which you should specify).

Part II, 2017 List of Questions [TURN OVER

2017

Page 11: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

86

Paper 4, Section II

27K Principles of StatisticsFor the statistical model {Nd(θ,Σ), θ ∈ Rd}, where Σ is a known, positive-definite

d × d matrix, we want to estimate θ based on n i.i.d. observations X1, . . . ,Xn withdistribution Nd(θ,Σ).

(a) Derive the maximum likelihood estimator θn of θ. What is the distribution of θn?

(b) For α ∈ (0, 1), construct a confidence region Cαn such that Pθ(θ ∈ Cα

n ) = 1− α.

(c) For Σ = Id, compute the maximum likelihood estimator of θ for the followingparameter spaces:

(i) Θ = {θ : ‖θ‖2 = 1}.(ii) Θ = {θ : v⊤θ = 0} for some unit vector v ∈ Rd.

(d) For Σ = Id, we want to test the null hypothesis Θ0 = {0} (i.e. θ = 0) againstthe composite alternative Θ1 = Rd \ {0}. Compute the likelihood ratio statisticΛ(Θ1,Θ0) and give its distribution under the null hypothesis. Compare this resultwith the statement of Wilks’ theorem.

Part II, 2017 List of Questions

2017

Page 12: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

87

Paper 1, Section II

28K Principles of StatisticsFor a positive integer n, we want to estimate the parameter p in the binomial

statistical model {Bin(n, p), p ∈ [0, 1]}, based on an observation X ∼ Bin(n, p).

(a) Compute the maximum likelihood estimator for p. Show that the posteriordistribution for p under a uniform prior on [0, 1] is Beta(a, b), and specify a and b.

[The p.d.f. of Beta(a, b) is given by

fa,b(p) =(a+ b− 1)!

(a− 1)!(b − 1)!pa−1(1− p)b−1 . ]

(b) (i) For a risk function L, define the risk of an estimator p of p, and the Bayesrisk under a prior π for p.

(ii) Under the loss function

L(p, p) =(p− p)2

p(1− p),

find a Bayes optimal estimator for the uniform prior. Give its risk as a functionof p.

(iii) Give a minimax optimal estimator for the loss function L given above. Justifyyour answer.

Part II, 2017 List of Questions [TURN OVER

2017

Page 13: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

75

Paper 3, Section II

25J Principles of StatisticsLet X1, . . . ,Xn be i.i.d. random variables from a N(θ, 1) distribution, θ ∈ R, and

consider a Bayesian model θ ∼ N(0, v2) for the unknown parameter, where v > 0 is afixed constant.

(a) Derive the posterior distribution Π(· | X1, . . . ,Xn) of θ | X1, . . . ,Xn.

(b) Construct a credible set Cn ⊂ R such that

(i) Π(Cn|X1, . . . ,Xn) = 0.95 for every n ∈ N, and(ii) for any θ0 ∈ R,

PNθ0(θ0 ∈ Cn) → 0.95 as n→ ∞,

where PNθ denotes the distribution of the infinite sequence X1,X2, . . . when

drawn independently from a fixed N(θ, 1) distribution.

[You may use the central limit theorem.]

Paper 2, Section II

26J Principles of Statistics(a) State and prove the Cramer–Rao inequality in a parametric model {f(θ) : θ ∈ Θ},

where Θ ⊆ R. [Necessary regularity conditions on the model need not be specified.]

(b) Let X1, . . . ,Xn be i.i.d. Poisson random variables with unknown parameterEX1 = θ > 0. For Xn = (1/n)

∑ni=1Xi and S

2 = (n − 1)−1∑n

i=1(Xi − Xn)2 define

Tα = αXn + (1− α)S2, 0 6 α 6 1.

Show that Varθ(Tα) > Varθ(Xn) for all values of α, θ.

Now suppose θ = θ(X1, . . . ,Xn) is an estimator of θ with possibly nonzero biasB(θ) = Eθθ − θ. Suppose the function B is monotone increasing on (0,∞). Prove thatthe mean-squared errors satisfy

Eθ(θn − θ)2 > Eθ(Xn − θ)2 for all θ ∈ Θ.

Part II, 2016 List of Questions [TURN OVER

2016

Page 14: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

76

Paper 4, Section II

26J Principles of StatisticsConsider a decision problem with parameter space Θ. Define the concepts of a

Bayes decision rule δπ and of a least favourable prior.

Suppose π is a prior distribution on Θ such that the Bayes risk of the Bayesrule equals supθ∈ΘR(δπ, θ), where R(δ, θ) is the risk function associated to the decisionproblem. Prove that δπ is least favourable.

Now consider a random variable X arising from the binomial distributionBin(n, θ), where θ ∈ Θ = [0, 1]. Construct a least favourable prior for the squared riskR(δ, θ) = Eθ(δ(X) − θ)2. [You may use without proof the fact that the Bayes rule forquadratic risk is given by the posterior mean.]

Paper 1, Section II

27J Principles of StatisticsDerive the maximum likelihood estimator θn based on independent observations

X1, . . . ,Xn that are identically distributed as N(θ, 1), where the unknown parameter θlies in the parameter space Θ = R. Find the limiting distribution of

√n(θn−θ) as n→ ∞.

Now defineθn = θn whenever |θn| > n−1/4,

= 0 otherwise,

and find the limiting distribution of√n(θn − θ) as n→ ∞.

Calculatelimn→∞

supθ∈Θ

nEθ(Tn − θ)2

for the choices Tn = θn and Tn = θn. Based on the above findings, which estimator Tn ofθ would you prefer? Explain your answer.

[Throughout, you may use standard facts of stochastic convergence, such as thecentral limit theorem, provided they are clearly stated.]

Part II, 2016 List of Questions

2016

Page 15: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

79

Paper 4, Section II

24J Principles of Statistics

Given independent and identically distributed observations X1, . . . ,Xn with finitemean E(X1) = µ and variance Var(X1) = σ2, explain the notion of a bootstrap sampleXb

1, . . . ,Xbn, and discuss how you can use it to construct a confidence interval Cn for µ.

Suppose you can operate a random number generator that can simulate independentuniform random variables U1, . . . , Un on [0, 1]. How can you use such a random numbergenerator to simulate a bootstrap sample?

Suppose that (Fn : n ∈ N) and F are cumulative probability distribution functionsdefined on the real line, that Fn(t) → F (t) as n → ∞ for every t ∈ R, and that F iscontinuous on R. Show that, as n → ∞,

supt∈R

|Fn(t)− F (t)| → 0.

State (without proof) the theorem about the consistency of the bootstrap of themean, and use it to give an asymptotic justification of the confidence interval Cn. Thatis, prove that as n → ∞, PN(µ ∈ Cn) → 1 − α where PN is the joint distribution ofX1,X2, . . . .

[You may use standard facts of stochastic convergence and the Central LimitTheorem without proof.]

Part II, 2015 List of Questions [TURN OVER

2015

Page 16: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 3, Section II

24J Principles of Statistics

Define what it means for an estimator θ of an unknown parameter θ to be consistent.

Let Sn be a sequence of random real-valued continuous functions defined on R suchthat, as n → ∞, Sn(θ) converges to S(θ) in probability for every θ ∈ R, where S : R → Ris non-random. Suppose that for some θ0 ∈ R and every ε > 0 we have

S(θ0 − ε) < 0 < S(θ0 + ε),

and that Sn has exactly one zero θn for every n ∈ N. Show that θn →P θ0 as n → ∞, anddeduce from this that the maximum likelihood estimator (MLE) based on observationsX1, . . . ,Xn from a N(θ, 1), θ ∈ R model is consistent.

Now consider independent observations X1, . . . ,Xn of bivariate normal randomvectors

Xi = (X1i,X2i)T ∼ N2

[(µi, µi)

T , σ2I2], i = 1, . . . , n,

where µi ∈ R, σ > 0 and I2 is the 2× 2 identity matrix. Find the MLE µ = (µ1, . . . , µn)T

of µ = (µ1, . . . , µn)T and show that the MLE of σ2 equals

σ2 =1

n

n∑

i=1

s2i , s2i =1

2[(X1i − µi)

2 + (X2i − µi)2].

Show that σ2 is not consistent for estimating σ2. Explain briefly why the MLE fails inthis model.

[You may use the Law of Large Numbers without proof.]

Part II, 2015 List of Questions

2015

Page 17: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

81

Paper 2, Section II

25J Principles of Statistics

Consider a random variable X arising from the binomial distribution Bin(n, θ),θ ∈ Θ = [0, 1]. Find the maximum likelihood estimator θMLE and the Fisher informationI(θ) for θ ∈ Θ.

Now consider the following priors on Θ:

(i) a uniform U([0, 1]) prior on [0, 1],

(ii) a prior with density π(θ) proportional to√

I(θ),

(iii) a Beta(√n/2,

√n/2) prior.

Find the means E[θ|X] and modes mθ|X of the posterior distributions corresponding tothe prior distributions (i)–(iii). Which of these posterior decision rules coincide with θMLE?Which one is minimax for quadratic risk? Justify your answers.

[You may use the following properties of the Beta(a, b) (a > 0, b > 0) distribution.Its density f(x; a, b), x ∈ [0, 1], is proportional to xa−1(1 − x)b−1, its mean is equal toa/(a+ b), and its mode is equal to

max(a− 1, 0)

max(a, 1) +max(b, 1)− 2

provided either a > 1 or b > 1.

You may further use the fact that a unique Bayes rule of constant risk is a uniqueminimax rule for that risk.]

Part II, 2015 List of Questions [TURN OVER

2015

Page 18: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

82

Paper 1, Section II

25J Principles of Statistics

Consider a normally distributed random vector X ∈ Rp modelled as X ∼ N(θ, Ip)where θ ∈ Rp, Ip is the p× p identity matrix, and where p > 3. Define the Stein estimator

θSTEIN of θ.

Prove that θSTEIN dominates the estimator θ = X for the risk function induced byquadratic loss

ℓ(a, θ) =

p∑

i=1

(ai − θi)2, a ∈ Rp.

Show however that the worst case risks coincide, that is, show that

supθ∈Rp

Eθ ℓ(X, θ) = supθ∈Rp

Eθ ℓ(θSTEIN, θ).

[You may use Stein’s lemma without proof, provided it is clearly stated.]

Part II, 2015 List of Questions

2015

Page 19: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 4, Section II

27J Principles of StatisticsSuppose you have at hand a pseudo-random number generator that can simulate an

i.i.d. sequence of uniform U [0, 1] distributed random variables U∗1 , . . . , U

∗N for any N ∈ N.

Construct an algorithm to simulate an i.i.d. sequence X∗1 , . . . ,X

∗N of standard normal

N(0, 1) random variables. [Should your algorithm depend on the inverse of any cumulativeprobability distribution function, you are required to provide an explicit expression for thisinverse function.]

Suppose as a matter of urgency you need to approximately evaluate the integral

I =1√2π

R

1

(π + |x|)1/4 e−x2/2dx.

Find an approximation IN of this integral that requires N simulation steps from yourpseudo-random number generator, and which has stochastic accuracy

Pr(|IN − I| > N−1/4) 6 N−1/2,

where Pr denotes the joint law of the simulated random variables. Justify your answer.

Paper 3, Section II

27J Principles of StatisticsState and prove Wilks’ theorem about testing the simple hypothesis H0 : θ = θ0,

against the alternative H1 : θ ∈ Θ \ {θ0}, in a one-dimensional regular parametric model{f(·, θ) : θ ∈ Θ},Θ ⊆ R. [You may use without proof the results from lectures on theconsistency and asymptotic distribution of maximum likelihood estimators, as well as onuniform laws of large numbers. Necessary regularity conditions can be assumed withoutstatement.]

Find the maximum likelihood estimator θn based on i.i.d. observations X1, . . . ,Xn

in a N(0, θ)-model, θ ∈ Θ = (0,∞). Deduce the limit distribution as n → ∞ of thesequence of statistics

−n(log(X2)− (X2 − 1)

),

where X2 = (1/n)∑n

i=1 X2i and X1, . . . ,Xn are i.i.d. N(0, 1).

Part II, 2014 List of Questions

2014

Page 20: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

81

Paper 2, Section II

28J Principles of StatisticsIn a general decision problem, define the concepts of a Bayes rule and of admissi-

bility. Show that a unique Bayes rule is admissible.

Consider i.i.d. observations X1, . . . ,Xn from a Poisson(θ), θ ∈ Θ = (0,∞), model.Can the maximum likelihood estimator θMLE of θ be a Bayes rule for estimating θ inquadratic risk for any prior distribution on θ that has a continuous probability density on(0,∞)? Justify your answer.

Now model the Xi as i.i.d. copies of X|θ ∼ Poisson(θ), where θ is drawn from aprior that is a Gamma distribution with parameters α > 0 and λ > 0 (given below).Show that the posterior distribution of θ|X1, . . . ,Xn is a Gamma distribution and find itsparameters. Find the Bayes rule θBAYES for estimating θ in quadratic risk for this prior.[The Gamma probability density function with parameters α > 0, λ > 0 is given by

f(θ) =λαθα−1e−λθ

Γ(α), θ > 0,

where Γ(α) is the usual Gamma function.]

Finally assume that the Xi have actually been generated from a fixed Poisson(θ0)distribution, where θ0 > 0. Show that

√n(θBAYES−θMLE ) converges to zero in probability

and deduce the asymptotic distribution of√n(θBAYES − θ0) under the joint law PN

θ0of the

random variables X1,X2, . . . . [You may use standard results from lectures without proofprovided they are clearly stated.]

Paper 1, Section II

28J Principles of StatisticsState without proof the inequality known as the Cramer–Rao lower bound in a

parametric model {f(·, θ) : θ ∈ Θ},Θ ⊆ R. Give an example of a maximum likelihoodestimator that attains this lower bound, and justify your answer.

Give an example of a parametric model where the maximum likelihood estimatorbased on observations X1, . . . ,Xn is biased. State without proof an analogue of theCramer–Rao inequality for biased estimators.

Define the concept of a minimax decision rule, and show that the maximumlikelihood estimator θMLE based on X1, . . . ,Xn in a N(θ, 1) model is minimax forestimating θ ∈ Θ = R in quadratic risk.

Part II, 2014 List of Questions [TURN OVER

2014

Page 21: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

79

Paper 4, Section II

27K Principles of StatisticsAssuming only the existence and properties of the univariate normal distribution,

define Np(µ,Σ), the multivariate normal distribution with mean (row-)vector µ anddispersion matrix Σ; and Wp(ν; Σ), the Wishart distribution on integer ν > 1 degreesof freedom and with scale parameter Σ. Show that, if X ∼ Np(µ,Σ), S ∼ Wp(ν; Σ), and

b (1 × q), A (p × q) are fixed, then b +XA ∼ Nq(b + µA,Φ), ATSA ∼ Wp(ν; Φ), where

Φ = ATΣA.

The random (n × p) matrix X has rows that are independently distributed asNp(M,Σ), where both parameters M and Σ are unknown. Let X := n−11TX, where1 is the (n × 1) vector of 1s; and Sc := XTΠX, with Π := In − n−111T. State the jointdistribution of X and Sc given the parameters.

Now suppose n > p and Σ is positive definite. Hotelling’s T 2 is defined as

T 2 := n(X −M)(Sc)−1

(X −M)T

where Sc:= Sc/ν with ν := (n − 1). Show that, for any values of M and Σ,

(ν − p+ 1

νp

)T 2 ∼ F p

ν−p+1 ,

the F distribution on p and ν − p+ 1 degrees of freedom.

[You may assume that:

1. If S ∼ Wp(ν; Σ) and a is a fixed (p × 1) vector, then

aTΣ−1a

aTS−1a∼ χ2

ν−p+1.

2. If V ∼ χ2p, W ∼ χ2

λ are independent, then

V/p

W/λ∼ F p

λ . ]

Part II, 2013 List of Questions [TURN OVER

2013

Page 22: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 3, Section II

27K Principles of StatisticsWhat is meant by a convex decision problem? State and prove a theorem to the

effect that, in a convex decision problem, there is no point in randomising. [You may usestandard terms without defining them.]

The sample space, parameter space and action space are each the two-point set{1, 2}. The observable X takes value 1 with probability 2/3 when the parameter Θ = 1,and with probability 3/4 when Θ = 2. The loss function L(θ, a) is 0 if a = θ, otherwise 1.Describe all the non-randomised decision rules, compute their risk functions, and plotthese as points in the unit square. Identify an inadmissible non-randomised decision rule,and a decision rule that dominates it.

Show that the minimax rule has risk function (8/17, 8/17), and is Bayes against aprior distribution that you should specify. What is its Bayes risk? Would a Bayesian withthis prior distribution be bound to use the minimax rule?

Paper 1, Section II

28K Principles of StatisticsWhen the real parameter Θ takes value θ, variables X1,X2, . . . arise independently

from a distribution Pθ having density function pθ(x) with respect to an underlyingmeasure µ. Define the score variable Un(θ) and the information function In(θ) forestimation of Θ based on Xn := (X1, . . . ,Xn), and relate In(θ) to i(θ) := I1(θ).

State and prove the Cramer–Rao inequality for the variance of an unbiased estimatorof Θ. Under what conditions does this inequality become an equality? What is the formof the estimator in this case? [You may assume Eθ{Un(θ)} = 0, varθ{Un(θ)} = In(θ), andany further required regularity conditions, without comment.]

Let Θn be the maximum likelihood estimator of Θ based on Xn. What is theasymptotic distribution of n

12 (Θn −Θ) when Θ = θ?

Suppose that, for each n, Θn is unbiased for Θ, and the variance of n12 (Θn − Θ) is

exactly equal to its asymptotic variance. By considering the estimator αΘk + (1 − α)Θn,or otherwise, show that, for k < n, covθ(Θk, Θn) = varθ(Θn).

Part II, 2013 List of Questions

2013

Page 23: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

81

Paper 2, Section II

28K Principles of StatisticsDescribe theWeak Sufficiency Principle (WSP) and the Strong Sufficiency Principle

(SSP). Show that Bayesian inference with a fixed prior distribution respects WSP.

A parameter Φ has a prior distribution which is normal with mean 0 and precision(inverse variance) hΦ. Given Φ = φ, further parameters Θ := (Θi : i = 1, . . . , I) haveindependent normal distributions with mean φ and precision hΘ. Finally, given bothΦ = φ and Θ = θ := (θ1, . . . , θI), observables X := (Xij : i = 1, . . . , I; j = 1, . . . , J) areindependent, Xij being normal with mean θi, and precision hX . The precision parameters

(hΦ, hΘ, hX) are all fixed and known. Let X := (X1, . . . ,XI), where Xi :=∑J

j=1Xij/J .

Show, directly from the definition of sufficiency, that X is sufficient for (Φ,Θ). [You mayassume without proof that, if Y1, . . . , Yn have independent normal distributions with thesame variance, and Y := n−1

∑ni=1 Yi, then the vector (Y1−Y , . . . , Yn−Y ) is independent

of Y .]

For data-values x := (xij : i = 1, . . . , I; j = 1, . . . , J), determine the jointdistribution, Πφ say, of Θ, given X = x and Φ = φ. What is the distribution of Φ,given Θ = θ and X = x?

Using these results, describe clearly how Gibbs sampling combined with Rao–Blackwellisation could be applied to estimate the posterior joint distribution of Θ, givenX = x.

Part II, 2013 List of Questions [TURN OVER

2013

Page 24: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

77

Paper 4, Section II

27K Principles of StatisticsFor i = 1, . . . , n, the pairs (Xi, Yi) have independent bivariate normal distributions,

with E(Xi) = µX , E(Yi) = µY , var(Xi) = var(Yi) = φ, and corr(Xi, Yi) = ρ. The meansµX , µY are known; the parameters φ > 0 and ρ ∈ (−1, 1) are unknown.

Show that the joint distribution of all the variables belongs to an exponential family,and identify the natural sufficient statistic, natural parameter, and mean-value parameter.Hence or otherwise, find the maximum likelihood estimator ρ of ρ.

Let Ui := Xi + Yi, Vi := Xi − Yi. What is the joint distribution of (Ui, Vi)?

Show that the distribution of

(1 + ρ)/(1− ρ)

(1 + ρ)/(1− ρ)

is Fnn . Hence describe a (1−α)-level confidence interval for ρ. Briefly explain what would

change if µX and µY were also unknown.

[Recall that the distribution F ν1ν2 is that of (W1/ν1)/(W2/ν2), where, independently for

j = 1 and j = 2, Wj has the chi-squared distribution with νj degrees of freedom.]

Part II, 2012 List of Questions [TURN OVER

2012

Page 25: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

78

Paper 3, Section II

27K Principles of StatisticsThe parameter vector is Θ ≡ (Θ1,Θ2,Θ3), with Θi > 0, Θ1 + Θ2 + Θ3 = 1.

Given Θ = θ ≡ (θ1, θ2, θ3), the integer random vector X = (X1,X2,X3) has a trinomialdistribution, with probability mass function

p(x | θ) = n!

x1!x2!x3!θx11 θx2

2 θx33 ,

(xi > 0,

3∑

i=1

xi = n

). (1)

Compute the score vector for the parameter Θ∗ := (Θ1,Θ2), and, quoting any relevantgeneral result, use this to determine E(Xi) (i = 1, 2, 3).

Considering (1) as an exponential family with mean-value parameter Θ∗, what isthe corresponding natural parameter Φ ≡ (Φ1,Φ2)?

Compute the information matrix I for Θ∗, which has (i, j)-entry

Iij = −E(

∂2l

∂θi∂θj

)(i, j = 1, 2) ,

where l denotes the log-likelihood function, based on X, expressed in terms of (θ1, θ2).

Show that the variance of log(X1/X3) is asymptotic to n−1(θ−11 + θ−1

3 ) as n → ∞.[Hint. The information matrix IΦ for Φ is I−1 and the dispersion matrix of the maximumlikelihood estimator Φ behaves, asymptotically (for n → ∞) as I−1

Φ .]

Part II, 2012 List of Questions

2012

Page 26: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

79

Paper 2, Section II

28K Principles of StatisticsCarefully defining all italicised terms, show that, if a sufficiently general method of

inference respects both the Weak Sufficiency Principle and the Conditionality Principle,then it respects the Likelihood Principle.

The position Xt of a particle at time t > 0 has the Normal distribution N (0, φt),where φ is the value of an unknown parameter Φ; and the time, Tx, at which the particlefirst reaches position x 6= 0 has probability density function

px(t) =|x|√2πφt3

exp

(− x2

2φt

)(t > 0) .

Experimenter E1 observes Xτ , and experimenter E2 observes Tξ, where τ > 0, ξ 6= 0are fixed in advance. It turns out that Tξ = τ . What does the Likelihood Principle sayabout the inferences about Φ to be made by the two experimenters?

E1 bases his inference about Φ on the distribution and observed value of X2τ /τ ,

while E2 bases her inference on the distribution and observed value of ξ2/Tξ. Show thatthese choices respect the Likelihood Principle.

Part II, 2012 List of Questions [TURN OVER

2012

Page 27: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 1, Section II

28K Principles of StatisticsProve that, if T is complete sufficient for Θ, and S is a function of T , then S is the

minimum variance unbiased estimator of E(S |Θ).

When the parameter Θ takes a value θ > 0, observables (X1, . . . ,Xn) ariseindependently from the exponential distribution E(θ), having probability density function

p(x | θ) = θe−θx (x > 0) .

Show that the family of distributions

Θ ∼ Gamma (α, β) (α > 0, β > 0) , (1)

with probability density function

π(θ) =βα

Γ(α)θα−1e−βθ (θ > 0) ,

is a conjugate family for Bayesian inference about Θ (where Γ(α) is the Gamma function).

Show that the expectation of Λ := log Θ, under prior distribution (1), is ψ(α)−log β,where ψ(α) := (d/dα) log Γ(α). What is the prior variance of Λ? Deduce the posteriorexpectation and variance of Λ, given (X1, . . . ,Xn).

Let Λ denote the limiting form of the posterior expectation of Λ as α, β ↓ 0. Showthat Λ is the minimum variance unbiased estimator of Λ. What is its variance?

Part II, 2012 List of Questions

2012

Page 28: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

78

Paper 1, Section II

28K Principles of StatisticsDefine admissible, Bayes, minimax decision rules.

A random vector X = (X1,X2,X3)T has independent components, whereXi has the

normal distribution N (θi, 1) when the parameter vector Θ takes the value θ = (θ1, θ2, θ3)T.

It is required to estimate Θ by a point a ∈ R3, with loss function L(θ, a) = ‖a − θ‖2.What is the risk function of the maximum-likelihood estimator Θ := X? Show that Θ isdominated by the estimator Θ := (1− ‖X‖−2)X.

Paper 2, Section II

28K Principles of StatisticsRandom variables X1, . . . ,Xn are independent and identically distributed from the

normal distribution with unknown mean M and unknown precision (inverse variance) H.Show that the likelihood function, for data X1 = x1, . . . ,Xn = xn, is

Ln(µ, h) ∝ hn/2 exp(−1

2h{n (x− µ)2 + S

}),

where x := n−1∑

i xi and S :=∑

i(xi − x)2.

A bivariate prior distribution for (M,H) is specified, in terms of hyperparameters(α0, β0,m0, λ0), as follows. The marginal distribution of H is Γ(α0, β0), with density

π(h) ∝ hα0−1e−β0h (h > 0) ,

and the conditional distribution of M, given H = h, is normal with mean m0 and precisionλ0h.

Show that the conditional prior distribution of H, given M = µ, is

H | M = µ ∼ Γ(α0 +

12 , β0 +

12λ0 (µ−m0)

2).

Show that the posterior joint distribution of (M,H), given X1 = x1, . . . ,Xn = xn,has the same form as the prior, with updated hyperparameters (αn, βn,mn, λn) which youshould express in terms of the prior hyperparameters and the data.

[You may use the identity

p(t− a)2 + q(t− b)2 = (t− δ)2 + pq(a− b)2 ,

where p+ q = 1 and δ = pa+ qb.]

Explain how you could implement Gibbs sampling to generate a random samplefrom the posterior joint distribution.

Part II, 2011 List of Questions

2011

Page 29: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

79

Paper 3, Section II

27K Principles of StatisticsRandom variables X1,X2, . . . are independent and identically distributed from the

exponential distribution E(θ), with density function

pX(x | θ) = θe−θx (x > 0) ,

when the parameter Θ takes value θ > 0. The following experiment is performed. First X1

is observed. Thereafter, if X1 = x1, . . . ,Xi = xi have been observed (i > 1), a coin havingprobability α(xi) of landing heads is tossed, where α : R → (0, 1) is a known function andthe coin toss is independent of the X’s and previous tosses. If it lands heads, no furtherobservations are made; if tails, Xi+1 is observed.

Let N be the total number of X’s observed, and X := (X1, . . . ,XN ). Write downthe likelihood function for Θ based on data X = (x1, . . . , xn), and identify a minimalsufficient statistic. What does the likelihood principle have to say about inference fromthis experiment?

Now consider the experiment that only records Y := XN . Show that the densityfunction of Y has the form

pY (y | θ) = exp{a(y)− k(θ)− θy} .

Assuming the function a(·) is twice differentiable and that both pY (y | θ) and ∂pY (y | θ)/∂yvanish at 0 and ∞, show that a′(Y ) is an unbiased estimator of Θ, and find its variance.

Stating clearly any general results you use, deduce that

−k′′(θ)Eθ{a′′(Y )} > 1 .

Part II, 2011 List of Questions [TURN OVER

2011

Page 30: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

80

Paper 4, Section II

27K Principles of StatisticsWhat does it mean to say that a (1× p) random vector ξ has a multivariate normal

distribution?

Suppose ξ = (X,Y ) has the bivariate normal distribution with mean vectorµ = (µX , µY ), and dispersion matrix

Σ =

(σXX σXY

σXY σY Y

).

Show that, with β := σXY /σXX , Y − βX is independent of X, and thus that theconditional distribution of Y given X is normal with mean µY + β(X − µX) and varianceσY Y ·X := σY Y − σ2

XY /σXX .

For i = 1, . . . , n, ξi = (Xi, Yi) are independent and identically distributed with theabove distribution, where all elements of µ and Σ are unknown. Let

S =

(SXX SXY

SXY SY Y

):=

n∑

i=1

(ξi − ξ)T(ξi − ξ) ,

where ξ := n−1∑n

i=1 ξi.

The sample correlation coefficient is r := SXY /√SXXSY Y . Show that the distribu-

tion of r depends only on the population correlation coefficient ρ := σXY /√σXXσY Y .

Student’s t-statistic (on n − 2 degrees of freedom) for testing the null hypothesisH0 : β = 0 is

t :=β√

SY Y ·X/(n− 2)SXX

,

where β := SXY /SXX and SY Y ·X := SY Y − S2XY /SXX . Its density when H0 is true is

p(t) = C

(1 +

t2

n− 2

)−12 (n−1)

,

where C is a constant that need not be specified.

Express t in terms of r, and hence derive the density of r when ρ = 0.

How could you use the sample correlation r to test the hypothesis ρ = 0?

Part II, 2011 List of Questions

2011

Page 31: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

76

Paper 1, Section II

28J Principles of StatisticsThe distribution of a random variable X is obtained from the binomial distribution

B(n; Π) by conditioning on X > 0; here Π ∈ (0, 1) is an unknown probability parameterand n is known. Show that the distributions of X form an exponential family and identifythe natural sufficient statistic T , natural parameter Φ, and cumulant function k(φ). Usinggeneral properties of the cumulant function, compute the mean and variance of X whenΠ = π . Write down an equation for the maximum likelihood estimate Π of Π and explainwhy, when Π = π, the distribution of Π is approximately normal N (π, π(1 − π)/n) forlarge n.

Suppose we observe X = 1 . It is suggested that, since the condition X > 0 isthen automatically satisfied, general principles of inference require that the inference tobe drawn should be the same as if the distribution of X had been B(n; Π) and we hadobserved X = 1 . Comment briefly on this suggestion.

Paper 2, Section II

28J Principles of StatisticsDefine the Kolmogorov–Smirnov statistic for testing the null hypothesis that real

random variables X1, . . . ,Xn are independently and identically distributed with specifiedcontinuous, strictly increasing distribution function F , and show that its null distributiondoes not depend on F .

A composite hypothesis H0 specifies that, when the unknown positive parameterΘ takes value θ, the random variables X1, . . . ,Xn arise independently from the uniformdistribution U [0, θ]. Letting J := arg max 16i6nXi, show that, under H0, the statistic(J,XJ ) is sufficient for Θ. Show further that, given {J = j, Xj = ξ}, the random variables(Xi : i 6= j) are independent and have the U [0, ξ] distribution. How might you apply theKolmogorov–Smirnov test to test the hypothesis H0?

Part II, 2010 List of Questions

2010

Page 32: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

77

Paper 3, Section II

27J Principles of StatisticsDefine the normal and extensive form solutions of a Bayesian statistical decision

problem involving parameter Θ, random variable X, and loss function L(θ, a). How arethey related? Let R0 = R0(Π) be the Bayes loss of the optimal act when Θ ∼ Π and nodata can be observed. Express the Bayes risk R1 of the optimal statistical decision rulein terms of R0 and the joint distribution of (Θ,X).

The real parameter Θ has distribution Π, having probability density function π(·).Consider the problem of specifying a set S ⊆ R such that the loss when Θ = θ isL(θ, S) = c |S| − 1S(θ), where 1S is the indicator function of S, where c > 0, and where|S| =

∫S dx. Show that the “highest density” region S∗ := {θ : π(θ) > c} supplies a Bayes

act for this decision problem, and explain why R0(Π) 6 0.

For the case Θ ∼ N (µ, σ2), find an expression for R0 in terms of the standardnormal distribution function Φ.

Suppose now that c = 0.5 , that Θ ∼ N (0, 1) and that X|Θ ∼ N (Θ, 1/9). Showthat R1 < R0.

Paper 4, Section II

27J Principles of StatisticsDefine completeness and bounded completeness of a statistic T in a statistical

experiment.

Random variables X1, X2, X3 are generated as Xi = Θ1/2 Z +(1−Θ)1/2 Yi , whereZ, Y1, Y2, Y3 are independently standard normal N (0, 1), and the parameter Θ takesvalues in (0, 1). What is the joint distribution of (X1, X2, X3) when Θ = θ? Writedown its density function, and show that a minimal sufficient statistic for Θ based on(X1, X2, X3) is T = (T1, T2) := (

∑3i=1X

2i , (

∑3i=1 Xi)

2).

[Hint: You may use that if I is the n× n identity matrix and J is the n× n matrix all ofwhose entries are 1, then aI + bJ has determinant an−1(a+nb), and inverse cI + dJ withc = 1/a , d = −b/(a(a+ nb)).]

What is Eθ(T1)? Is T complete for Θ?

Let S := Prob(X21 6 1 | T ). Show that Eθ(S) is a positive constant c which does

not depend on θ, but that S is not identically equal to c . Is T boundedly complete for Θ?

Part II, 2010 List of Questions [TURN OVER

2010

Page 33: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

76

Paper 1, Section II

28I Principles of Statistics(i) Let X1, . . . ,Xn be independent and identically distributed random variables,

having the exponential distribution E(λ) with density p(x|λ) = λ exp(−λx) for x, λ > 0.Show that Tn =

∑ni=1 Xi is minimal sufficient and complete for λ.

[You may assume uniqueness of Laplace transforms.]

(ii) For given x > 0, it is desired to estimate the quantity φ = Prob(X1 > x|λ).Compute the Fisher information for φ.

(iii) State the Lehmann–Scheffe theorem. Show that the estimator φn of φ definedby

φn =

{0, if Tn < x,(1− x

Tn

)n−1, if Tn > x

is the minimum variance unbiased estimator of φ based on (X1, . . . ,Xn). Without doingany computations, state whether or not the variance of φn achieves the Cramer–Rao lowerbound, justifying your answer briefly.

Let k 6 n. Show that E(φk | Tn, λ) = φn.

Paper 2, Section II

28I Principles of StatisticsSuppose that the random vector X = (X1, . . . ,Xn) has a distribution over Rn

depending on a real parameter θ, with everywhere positive density function p(x | θ).Define the maximum likelihood estimator θ, the score variable U , the observed informationj and the expected (Fisher) information I for the problem of estimating θ from X.

For the case where the (Xi) are independent and identically distributed, show that,

as n → ∞, I−1/2 Ud→ N (0, 1). [You may assume sufficient conditions to allow interchange

of integration over the sample space and differentiation with respect to the parameter.]State the asymptotic distribution of θ.

The random vector X = (X1, . . . ,Xn) is generated according to the rule

Xi+1 = θXi + Ei,

where X0 = 1 and the (Ei) are independent and identically distributed from the standardnormal distribution N (0, 1). Write down the likelihood function for θ based on datax = (x1, . . . , xn), find θ and j and show that the pair (θ, j) forms a minimal sufficientstatistic.

A Bayesian uses the improper prior density π(θ) ∝ 1. Show that, in the posterior,S(θ− θ) (where S is a statistic that you should identify) has the same distribution as E1.

Part II, 2009 List of Questions

2009

Page 34: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

77

Paper 3, Section II

27I Principles of StatisticsWhat is meant by an equaliser decision rule? What is meant by an extended Bayes

rule? Show that a decision rule that is both an equaliser rule and extended Bayes isminimax.

Let X1, . . . ,Xn be independent and identically distributed random variables withthe normal distribution N (θ, h−1), and let k > 0. It is desired to estimate θ with lossfunction L(θ, a) = 1− exp{−1

2k(a− θ)2}.Suppose the prior distribution is θ ∼ N (m0, h

−10 ). Find the Bayes act and the

Bayes loss posterior to observing X1 = x1, . . . ,Xn = xn. What is the Bayes risk of theBayes rule with respect to this prior distribution?

Show that the rule that estimates θ by X = n−1∑n

i=1Xi is minimax.

Paper 4, Section II

27I Principles of StatisticsConsider the double dichotomy, where the loss is 0 for a correct decision and 1 for an

incorrect decision. Describe the form of a Bayes decision rule. Assuming the equivalenceof normal and extensive form analyses, deduce the Neyman–Pearson lemma.

For a problem with random variable X and real parameter θ, define monotonelikelihood ratio (MLR) and monotone test.

Suppose the problem has MLR in a real statistic T = t(X). Let φ be a monotonetest, with power function γ(·) , and let φ′ be any other test, with power function γ′(·) .Show that if θ1 > θ0 and γ(θ0) > γ′(θ0) , then γ(θ1) > γ′(θ1) . Deduce that there existsθ∗ ∈ [−∞,∞] such that γ(θ) 6 γ′(θ) for θ < θ∗ , and γ(θ) > γ′(θ) for θ > θ∗ .

For an arbitrary prior distribution Π with density π(·), and an arbitrary value θ∗,show that the posterior odds

Π(θ > θ∗ | X = x)

Π(θ 6 θ∗ | X = x)

is a non-decreasing function of t(x).

Part II, 2009 List of Questions [TURN OVER

2009

Page 35: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

59

1/II/27I Principles of Statistics

An angler starts fishing at time 0. Fish bite in a Poisson Process of rate Λ perhour, so that, if Λ = λ, the number Nt of fish he catches in the first t hours has thePoisson distribution P(λt), while Tn, the time in hours until his nth bite, has the Gammadistribution Γ(n, λ), with density function

p(t | λ) =λn

(n− 1)!tn−1 e−λ t (t > 0) .

Bystander B1 plans to watch for 3 hours, and to record the number N3 of fish caught.Bystander B2 plans to observe until the 10th bite, and to record T10 , the number of hoursuntil this occurs.

For B1 , show that Λ1 := N3/3 is an unbiased estimator of Λ whose variancefunction achieves the Cramer–Rao lower bound.

Find an unbiased estimator of Λ for B2, of the form Λ2 = k/T10. Does it achievethe Cramer–Rao lower bound? Is it minimum-variance-unbiased? Justify your answers.

In fact, the 10th fish bites after exactly 3 hours. For each of B1 and B2, writedown the likelihood function for Λ based their observations. What does the LikelihoodPrinciple have to say about the inferences to be drawn by B1 and B2, and why? Computethe estimates λ1 and λ2 produced by applying Λ1 and Λ2 to the observed data. Does themethod of minimum-variance-unbiased estimation respect the Likelihood Principle?

Part II 2008

2008

Page 36: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

60

2/II/27I Principles of Statistics

Under hypothesis Hi (i = 0, 1), a real-valued observable X, taking values in X ,has density function pi(·). Define the Type I error α and the Type II error β of a testφ : X → [0, 1] of the null hypothesis H0 against the alternative hypothesis H1. What arethe size and power of the test in terms of α and β?

Show that, for 0 < c <∞ , φ minimises cα+ β among all possible tests if and onlyif it satisfies

p1(x) > cp0(x)⇒ φ(x) = 1,

p1(x) < cp0(x)⇒ φ(x) = 0.

What does this imply about the admissibility of such a test?

Given the value θ of a parameter variable Θ ∈ [0, 1), the observable X has densityfunction

p(x | θ) =2(x− θ)(1− θ)2

(θ 6 x 6 1).

For fixed θ ∈ (0, 1), describe all the likelihood ratio tests of H0 : Θ = 0 againstHθ : Θ = θ.

For fixed k ∈ (0, 1), let φk be the test that rejects H0 if and only if X > k . Is φkadmissible as a test of H0 against Hθ for every θ ∈ (0, 1)? Is it uniformly most powerful forits size for testing H0 against the composite hypothesis H1 : Θ ∈ (0, 1)? Is it admissibleas a test of H0 against H1?

3/II/26I Principles of Statistics

Define the notion of exponential family (EF), and show that, for data arising asa random sample of size n from an exponential family, there exists a sufficient statisticwhose dimension stays bounded as n→∞ .

The log-density of a normal distribution N (µ, v) can be expressed in the form

log p(x | φ) = φ1 x+ φ2 x2 − k(φ)

where φ = (φ1, φ2) is the value of an unknown parameter Φ = (Φ1,Φ2). Determinethe function k , and the natural parameter-space F. What is the mean-value parameterH = (H1,H2) in terms of Φ?

Determine the maximum likelihood estimator Φ1 of Φ1 based on a random sample(X1, . . . , Xn), and give its asymptotic distribution for n→∞.

How would these answers be affected if the variance of X were known to havevalue v0?

Part II 2008

2008

Page 37: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

61

4/II/27I Principles of Statistics

Define sufficient statistic, and state the factorisation criterion for determiningwhether a statistic is sufficient. Show that a Bayesian posterior distribution dependson the data only through the value of a sufficient statistic.

Given the value µ of an unknown parameter M, observables X1, . . . , Xn areindependent and identically distributed with distribution N (µ, 1). Show that the statisticX := n−1

∑ni=1Xi is sufficient for M .

If the prior distribution is M ∼ N (0, τ2), determine the posterior distribution of Mand the predictive distribution of X.

In fact, there are two hypotheses as to the value of M. Under hypothesis H0,M takes the known value 0; under H1, M is unknown, with prior distribution N (0, τ2).Explain why the Bayes factor for choosing between H0 and H1 depends only on X, anddetermine its value for data X1 = x1, . . . , Xn = xn .

The frequentist 5%-level test of H0 against H1 rejects H0 when |X| > 1.96/√n.

What is the Bayes factor for the critical case |x| = 1.96/√n? How does this behave as

n→∞? Comment on the similarities or differences in behaviour between the frequentistand Bayesian tests.

Part II 2008

2008

Page 38: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

57

1/II/27I Principles of Statistics

Suppose that X has density f(·|θ) where θ ∈ Θ. What does it mean to say thatstatistic T ≡ T (X) is sufficient for θ?

Suppose that θ = (ψ, λ), where ψ is the parameter of interest, and λ is a nuisanceparameter, and that the sufficient statistic T has the form T = (C,S). What does itmean to say that the statistic S is ancillary? If it is, how (according to the conditionalityprinciple) do we test hypotheses on ψ? Assuming that the set of possible values for Xis discrete, show that S is ancillary if and only if the density (probability mass function)f(x|ψ, λ) factorises as

f(x|ψ, λ) = ϕ0(x) ϕC(C(x), S(x), ψ) ϕS(S(x), λ) (∗)

for some functions ϕ0, ϕC , and ϕS with the properties

x∈C−1(c)∩S−1(s)

ϕ0(x) = 1 =∑

s

ϕS(s, λ) =∑

s

c

ϕC(c, s, ψ)

for all c, s, ψ, and λ.

Suppose now that X1, . . . , Xn are independent observations from a Γ(a, b) distri-bution, with density

f(x|a, b) = (bx)a−1e−bxbI{x>0}/Γ(a).

Assuming that the criterion (∗) holds also for observations which are not discrete, showthat it is not possible to find (C(X), S(X)) sufficient for (a, b) such that S is ancillarywhen b is regarded as a nuisance parameter, and a is the parameter of interest.

Part II 2007

2007

Page 39: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

58

2/II/27I Principles of Statistics

(i) State Wilks’ likelihood ratio test of the null hypothesis H0 : θ ∈ Θ0 against thealternative H1 : θ ∈ Θ1, where Θ0 ⊂ Θ1. Explain when this test may be used.

(ii) Independent identically-distributed observations X1, . . . , Xn take values in the setS = {1, . . . ,K}, with common distribution which under the null hypothesis is ofthe form

P (X1 = k|θ) = f(k|θ) (k ∈ S)

for some θ ∈ Θ0, where Θ0 is an open subset of some Euclidean space Rd,d < K − 1. Under the alternative hypothesis, the probability mass function ofthe Xi is unrestricted.

Assuming sufficient regularity conditions on f to guarantee the existence anduniqueness of a maximum-likelihood estimator θn(X1, . . . , Xn) of θ for each n,show that for large n the Wilks’ likelihood ratio test statistic is approximately ofthe form

K∑

j=1

(Nj − nπj)2/Nj ,

where Nj =∑n

i=1 I{Xi=j}, and πj = f(j|θn). What is the asymptotic distributionof this statistic?

3/II/26I Principles of Statistics

(i) In the context of decision theory, what is a Bayes rule with respect to a given lossfunction and prior? What is an extended Bayes rule?

Characterise the Bayes rule with respect to a given prior in terms of the posteriordistribution for the parameter given the observation. When Θ = A = Rd for somed, and the loss function is L(θ, a) = ‖θ − a‖2, what is the Bayes rule?

(ii) Suppose that A = Θ = R, with loss function L(θ, a) = (θ−a)2 and suppose furtherthat under Pθ, X ∼ N(θ, 1).

Supposing that a N(0, τ−1) prior is taken over θ, compute the Bayes risk of thedecision rule dλ(X) = λX. Find the posterior distribution of θ given X, andconfirm that its mean is of the form dλ(X) for some value of λ which you shouldidentify. Hence show that the decision rule d1 is an extended Bayes rule.

Part II 2007

2007

Page 40: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

59

4/II/27I Principles of Statistics

Assuming sufficient regularity conditions on the likelihood f(x|θ) for a univariateparameter θ ∈ Θ, establish the Cramer–Rao lower bound for the variance of an unbiasedestimator of θ.

If θ(X) is an unbiased estimator of θ whose variance attains the Cramer–Rao lowerbound for every value of θ ∈ Θ, show that the likelihood function is an exponential family.

Part II 2007

2007

Page 41: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

53

1/II/27J Principles of Statistics

(a) What is a loss function? What is a decision rule? What is the risk function of adecision rule? What is the Bayes risk of a decision rule with respect to a prior π?

(b) Let θ 7→ R(θ, d) denote the risk function of decision rule d, and let r(π, d) denotethe Bayes risk of decision rule d with respect to prior π. Suppose that d∗ is adecision rule and π0 is a prior over the parameter space Θ with the two properties

(i) r(π0, d∗) = mind r(π0, d)

(ii) supθ R(θ, d∗) = r(π0, d

∗).

Prove that d∗ is minimax.

(c) Suppose now that Θ = A = R, where A is the space of possible actions, and thatthe loss function is

L(θ, a) = exp(−λaθ),where λ is a positive constant. If the law of the observation X given parameter θis N(θ, σ2), where σ > 0 is known, show (using (b) or otherwise) that the rule

d∗(x) = x/σ2λ

is minimax.

2/II/27J Principles of Statistics

Let {f(·|θ) : θ ∈ Θ} be a parametric family of densities for observation X. Whatdoes it mean to say that the statistic T ≡ T (X) is sufficient for θ? What does it mean tosay that T is minimal sufficient?

State the Rao–Blackwell theorem. State the Cramer–Rao lower bound for thevariance of an unbiased estimator of a (scalar) parameter, taking care to specify anyassumptions needed.

Let X1, . . . , Xn be a sample from a U(0, θ) distribution, where the positiveparameter θ is unknown. Find a minimal sufficient statistic T for θ. If h(T ) is an unbiasedestimator for θ, find the form of h, and deduce that this estimator is minimum-varianceunbiased. Would it be possible to reach this conclusion using the Cramer–Rao lowerbound?

Part II 2006

2006

Page 42: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

54

3/II/26J Principles of Statistics

Write an essay on the role of the Metropolis–Hastings algorithm in computationalBayesian inference on a parametric model. You may for simplicity assume that theparameter space is finite. Your essay should:

(a) explain what problem in Bayesian inference the Metropolis–Hastings algorithm isused to tackle;

(b) fully justify that the algorithm does indeed deliver the required information aboutthe model;

(c) discuss any implementational issues that need care.

4/II/27J Principles of Statistics

(a) State the strong law of large numbers. State the central limit theorem.

(b) Assuming whatever regularity conditions you require, show that if

θn ≡ θn(X1, . . . , Xn) is the maximum-likelihood estimator of the unknown param-eter θ based on an independent identically distributed sample of size n, then underPθ √

n(θn − θ) → N(0, J(θ)−1) in distribution

as n→ ∞, where J(θ) is a matrix which you should identify. A rigorous derivationis not required.

(c) Suppose that X1, X2, . . . are independent binomial Bin(1, θ) random variables. Itis required to test H0 : θ = 1

2 against the alternative H1 : θ ∈ (0, 1). Show that theconstruction of a likelihood-ratio test leads us to the statistic

Tn = 2n{θn log θn + (1− θn) log(1− θn) + log 2},

where θn ≡ n−1∑n

k=1Xk. Stating clearly any result to which you appeal, for large

n, what approximately is the distribution of Tn under H0? Writing θn = 12 + Zn,

and assuming that Zn is small, show that

Tn ' 4nZ2n.

Using this and the central limit theorem, briefly justify the approximate distributionof Tn given by asymptotic maximum-likelihood theory. What could you say if theassumption that Zn is small failed?

Part II 2006

2006

Page 43: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

53

1/II/27I Principles of Statistics

State Wilks’ Theorem on the asymptotic distribution of likelihood-ratio test statis-tics.

Suppose that X1, . . . , Xn are independent with common N(µ, σ2) distribution,where the parameters µ and σ are both unknown. Find the likelihood-ratio test statisticfor testing H0 : µ = 0 against H1 : µ unrestricted, and state its (approximate) distribution.

What is the form of the t-test of H0 against H1? Explain why for large n thelikelihood-ratio test and the t-test are nearly the same.

2/II/27I Principles of Statistics

(i) Suppose thatX is a multivariate normal vector with mean µ ∈ Rd and covariancematrix σ2I, where µ and σ2 are both unknown, and I denotes the d× d identity matrix.Suppose that Θ0 ⊂ Θ1 are linear subspaces of Rd of dimensions d0 and d1, whered0 < d1 < d. Let Pi denote orthogonal projection onto Θi (i = 0, 1). Carefully derive thejoint distribution of (|X − P1X|2, |P1X − P0X|2) under the hypothesis H0 : µ ∈ Θ0. Howcould you use this to make a test of H0 against H1 : µ ∈ Θ1?

(ii) Suppose that I students take J exams, and that the mark Xij of student i inexam j is modelled as

Xij = m+ αi + βj + εij

where∑

i αi = 0 =∑

j βj , the εij are independent N(0, σ2), and the parameters m, α, βand σ are unknown. Construct a test of H0 : βj = 0 for all j against H1 :

∑j βj = 0.

3/II/26I Principles of Statistics

In the context of decision theory, explain the meaning of the following italicizedterms: loss function, decision rule, the risk of a decision rule, a Bayes rule with respectto prior π, and an admissible rule. Explain how a Bayes rule with respect to a prior π canbe constructed.

Suppose that X1, . . . , Xn are independent with common N(0, v) distribution, wherev > 0 is supposed to have a prior density f0. In a decision-theoretic approach toestimating v, we take a quadratic loss: L(v, a) = (v − a)2. Write X = (X1, . . . , Xn)and |X| = (X2

1 + . . .+X2n)

1/2.

By considering decision rules (estimators) of the form v(X) = α|X|2, prove that ifα 6= 1/(n+ 2) then the estimator v(X) = α|X|2 is not Bayes, for any choice of prior f0.

By considering decision rules of the form v(X) = α|X|2 + β, prove that if α 6= 1/nthen the estimator v(X) = α|X|2 is not Bayes, for any choice of prior f0.

[You may use without proof the fact that, if Z has a N(0, 1) distribution, then EZ4 = 3.]

Part II 2005

2005

Page 44: Principles of Statistics · 29K Principles of Statistics A scientist wishes to estimate the proportion 2 (0 ;1) of presence of a gene in a population of ies of size n . Every y receives

54

4/II/27I Principles of Statistics

A group of n hospitals is to be ‘appraised’; the ‘performance’ θi of hospital i hasa N(0, 1/τ) prior distribution, different hospitals being independent. The ‘performance’cannot be measured directly, so an expensive firm of management consultants has beenhired to arrive at each hospital’s Standardised Index of Quality [SIQ], this being a numberXi for hospital i related to θi by the commercially-sensitive formula

Xi = θi + εi,

where the εi are independent with common N(0, 1/τε) distribution.

(i) Assume that τ and τε are known. What is the posterior distribution of θ givenX? Suppose that hospital j was the hospital with the lowest SIQ, with a value Xj = x;conditional on X, what is the distribution of θj?

(ii) Now, instead of assuming τ and τε known, suppose that τ has a Gamma priorwith parameters (α, β), density

f(t) = (βt)α−1βe−βt/Γ(α)

for known α and β, and that τε = κτ , where κ is a known constant. Find the posteriordistribution of (θ, τ) given X. Comment briefly on the form of the distribution.

Part II 2005

2005