Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction...

32
Chapter 4: Element sampling design: Part 2 Jae-Kwang Kim Fall, 2014

Transcript of Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction...

Page 1: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Chapter 4: Element sampling design: Part 2

Jae-Kwang Kim

Fall, 2014

Page 2: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Introduction

1 Introduction

2 Poisson sampling

3 PPS sampling

4 πps sampling

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 2 / 32

Page 3: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Introduction

Taxonomy

Equal probability sampling Unequal probability sampling

SRS (without replacement) πps sampling

SRS with replacement PPS sampling

Bernoulli sampling Poisson sampling

Systematic sampling Systematic πps sampling

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 3 / 32

Page 4: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Introduction

Why consider unequal probability sampling ?Example: N = 4 population of companies

Farm Size y (yield)

A 100 11B 200 20C 300 24D 1,000 245

Select n = 1 unit by

With equal probabilityWith probability proportional to size :

Compare the variances.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 4 / 32

Page 5: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Poisson sampling

1 Introduction

2 Poisson sampling

3 PPS sampling

4 πps sampling

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 5 / 32

Page 6: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Poisson sampling

Definition:Ii

inde∼ Bernoulli (πi ) , i = 1, 2, · · · ,N.

If πi = π, it is called Bernoulli sampling.

Estimation (of Y =∑N

i=1 yi )

YHT =N∑i=1

Iiyi/πi

Variance

Var(YHT

)=

N∑i=1

(1

πi− 1

)y2i

Variance estimation

V(YHT

)=

N∑i=1

Iiπi

(1

πi− 1

)y2i

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 6 / 32

Page 7: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Poisson sampling

Optimal design: minimize Var(YHT

)subject to

∑Ni=1 πi = n

πi ∝ yi

To prove this, use Cauchy-Schwarz inequality(n∑

i=1

a2i

) n∑j=1

b2j

≥ ( n∑i=1

aibi

)2

with equality if and only if ai ∝ bi for all i = 1, · · · , n.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 7 / 32

Page 8: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Poisson sampling

Disadvantage: sample size is random and it can decrease theefficiency of the HT estimator.

ExampleN=600 of students who took a test in a university. Want to estimatethe passing rate on the test. Use a Bernoulli sampling with π = 1/6.ns = 90 sample size is realized. Among the 90 sample students, 60students are found to have passed. What is a reasonable estimator ofthe total number of students who passed the test ?

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 8 / 32

Page 9: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

Poisson sampling

Remedy1 Use an alternative estimator is

Y = N

∑Ni=1 Iiyi/πi∑Ni=1 Ii/πi

.

It is often called Hajek estimator. Its variance is

Var(Y).

=N∑i=1

(1

πi− 1

)(yi − y)2 .

2 Use rejective sampling:“conditional distribution of the Bernoulli sampling distribution givenn = n0 ⇐⇒ simple random sampling without replacement with sizen0”

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 9 / 32

Page 10: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

1 Introduction

2 Poisson sampling

3 PPS sampling

4 πps sampling

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 10 / 32

Page 11: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Basic Setup

1 Let x1, · · · , xN be known characteristics of the population elementssuch that xk > 0 for all i . This xi is called the measure of size(MOS). Examples of MOS include the size of farm, the number ofemployees in a company, and the acreage of counties.

2 Wish to select a sample with the selection probability proportional toxi .

3 If the sample size is equal to one, then it is an easy job to select asample with probability proportional to xi .

4 Probability proportional to size (PPS) sampling idea: Use mindependent selection of a sample of size one with probabilityproportional to xi . Thus, it is a with-replacement sampling in thesense that, once drawn, an element is replaced into the population sothat all N elements participate in each draw.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 11 / 32

Page 12: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

With-replacement sampling:

Pro: Easy to implement and to investigate its properties. Maybe agood approximation of without-replacement sampling if n/N isnegligible.Con: Sample elements can be duplicated. Inefficient.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 12 / 32

Page 13: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

How to select a PPS sample with m = 1?

Method 1: Cumulative total method

[Step 1] Set T0 = 0 and compute Tk = Tk−1 + xk , k = 1, 2, · · · ,N.

[Step 2] Draw ε ∼ Unif (0, 1). If ε ∈ (Tk−1/TN ,Tk/TN), element k isselected.

Very popular. It needs a list of all xk in the population.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 13 / 32

Page 14: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Method 2: Lahiri’s method

[Step 0] Choose M > {x1, x2, · · · , xN}. Set r = 1.

[Step 1] Draw kr by SRS from {1, 2, · · · ,N}.[Step 2] Draw εr ∼ Unif (0, 1).

[Step 3] If εr ≤ xkr /M, then select element kr and stop. Otherwise,reject kr and goto Step 1 with r = r + 1.

The basic idea is based on the rejection algorithm due to Von Neymann.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 14 / 32

Page 15: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Justification

πk = Pr(k ∈ A)

=∞∑r=1

Pr

Kr = k, εr <xkrM,

r−1⋂j=1

(εj >xkjM

)

=

∞∑r=1

1

N

xkM×

r−1∏j=1

Pr{εj >

xkjM

}=

∞∑r=1

1

N

xkM

(1− xU

M

)r−1

=1

N

xkM

1

1− (1− xU/M)

=xk∑Ni=1 xi

.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 15 / 32

Page 16: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Let ai be the index of the element in the i-th with-replacementsampling.

Unequal probability with replacement: Consider p1, p2, · · · , pN > 0such that

∑Ni=1 pi = 1. We can construct pk from xk by

pk = xk/∑N

i=1 xi .

On the i-th draw, label k is selected with probability pk . That isPr (ai = k) = pk . Note that

πk = Pr (k ∈ A)

= 1− Pr (k /∈ A)

= 1− (1− pk)m

1 For m = 1, πk = pk .2 For m > 1 and pk ’s are small, πk

.= mpk .

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 16 / 32

Page 17: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Estimator of Y =∑N

i=1 yi :1 First, define

Zi =yaipai

=N∑

k=1

ykpk

I (ai = k) .

Note that Z1, · · · ,Zm are independent random variables since the mdraws are independent.

2 Z1, · · · ,Zm are identically distributed since the same probabilities areused at each draw, where E (Zi ) = Y and

V (Zi ) =N∑

k=1

(ykpk− Y

)2

pk ≡ V1.

3 Thus, Z1, · · · ,Zm are IID with mean Y and variance V1. Usez =

∑mk=1 Zk/m to estimate Y .

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 17 / 32

Page 18: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

PPS sampling

Hansen-Hurwitz estimator:

YHH ≡m∑

k=1

Zk/m

where Zi = yk/pk if ai = k .

Properties1 Unbiased estimator of Y =

∑Ni=1 yi

2 V(YHH

)= V1/m by standard result.

3 Unbiased estimator of V(tHH)

is

V(YHH

)=

1

m

1

m − 1

m∑i=1

(zi − z)2

4 For large m, YHH is AN (Y ,V1/m).

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 18 / 32

Page 19: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

1 Introduction

2 Poisson sampling

3 PPS sampling

4 πps sampling

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 19 / 32

Page 20: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Ideally,1 The actual selection of the sample is relatively simple.2 The first-order inclusion probabilities πi are strictly proportional to xk .3 The second-order inclusion probabilities satisfy πkl > 0 for all k 6= l .

(measurable sampling design)4 The πkl can be computed without very heavy calculations.5 ∆kl = πkl − πkπl < 0 for all k 6= l to guarantee that the SYG variance

estimator is always nonnegative.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 20 / 32

Page 21: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Motivation:

PPS sampling satisfies the above conditions but it can have duplicatedsample elements → inefficient.Want to find a fixed-size sampling design with πk ∝ xk where xk > 0and known. It is called πps design.

Remark: For fixed-size design, πk ∝ xk and∑N

i=1 πi = n leads to

πk =nxk∑Ni=1 xi

,

which can be contradictory to the fact that πk → 1 for n→ N. Also,πk can be grater than 1 if xk is extremely large.

Strict πps sampling may not be feasible from a particular population.We may need to truncate some of xi such that πi computed above isno greater than one.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 21 / 32

Page 22: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Problem: Given π1, · · · , πN , where∑N

i=1 πi = n and πi ∈ (0, 1] for alli , how to select a nonreplacement sample of size n ?

Classification of the procedures for the selection of nonreplacementunequal probability samples

1 Draw-by-draw method2 Mass draw procedure (rejective sampling)3 Systematic procedure

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 22 / 32

Page 23: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Draw-by-draw method (n = 2)

Notationθi : the probability of selecting i in the first sample selection.θj|i : the probability of selecting j in the second sample selection giventhat i is selected in the first sample selection.

Inclusion probabilitiesSecond order inclusion probability (for i 6= j)

πij = θiθj|i + θjθi|j

First order inclusion probability: Since∑

j 6=i πij = πi ,

πi = θi +∑j 6=i

θjθi|j

Restrictions on θi and θj|i :

θi +∑j 6=i

θjθi|j = πi .

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 23 / 32

Page 24: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Example

Unequal Nonreplacement sampling of size n = 2 from the population ofsize N = 5.

ID pi = πi/n πi1 0.20 0.42 0.10 0.23 0.25 0.54 0.25 0.55 0.20 0.4

How to choose θi and θj |i ?

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 24 / 32

Page 25: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Example

If we choose θi = pi and θj |i ∝ pj , then

ID pi θi θj |1 θj |2 θj |3 θj |4 θj |51 0.2 0.20 0 4/18 4/15 4/15 4/162 0.1 0.10 2/16 0 2/15 2/15 2/163 0.25 0.25 5/16 5/18 0 5/15 5/164 0.25 0.25 5/16 5/18 5/15 0 5/165 0.2 0.20 4/16 4/18 4/15 4/15 0

π1 = 0.2 + 0.1(4/18) + 0.25(4/15) + 0.25(4/15) + 0.2(4/16)

= 0.4056 6= 0.4

π2 = 0.1 + 0.2(2/16) + 0.25(2/15) + 0.25(2/15) + 0.2(2/16)

= 0.2167 6= 0.2

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 25 / 32

Page 26: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Draw-by-draw method (n = 2)

Brewer (1963) method: Use

θi ∝pi (1− pi )

1− 2pi

andθj |i ∝ pj

Durbin (1967) method: Use

θi ∝ pi

and

θj |i = π−1i πij ∝ pj

(1

1− 2pi+

1

1− 2pj

)

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 26 / 32

Page 27: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Motivation for Brewer’s method

If we choose θj |i = pj/(1− pi ) for i 6= j , using

πi = θi +∑j 6=i

θjθi |j ,

we have

2pi = θi +∑j 6=i

θjpi

1− pj

= θi + pi∑j 6=i

θj1− pj

= θi + pi

(B − θi

1− pi

)Therefore,

θi = (2pi − piB)/{1− pi/(1− pi )} ∝ pi (1− pi )/(1− 2pi )

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 27 / 32

Page 28: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Motivation for Durbin’s method

For θi = pi = πi/2, the choice of θj |i = πij/πi satisfies

πi = θi +∑j 6=i

θjθi |j ,

πij = θiθj |i + θjθi |j

Durbin used

πij =πiπj

2(1 + A)

(1

1− πi+

1

1− πj

)where A = 0.5

∑Ni=1 πi/(1− πi ) =

∑Ni=1 pi/(1− 2pi ).

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 28 / 32

Page 29: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Remark

The joint inclusion probabilities from the two methods are identical.

Satisfies πij < πiπj : nonnegative variance estimation forSen-Yates-Grundy formula.

Extension to n > 2 is possible, but involves heavy computation for theconditional selection probabilities. Discussed by Rao (1965),Sampford (1967), and Fuller (1971).

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 29 / 32

Page 30: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Systematic πps sampling

1 Choose R ∼ Unif (0, a]

2 Unit i is selected if

i−1∑j=1

xj < R + ka ≤i∑

j=1

xj

for some k = 0, 1, · · · , n − 1, where a =∑N

i=1 xi/n is the samplinginterval for the systematic sampling.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 30 / 32

Page 31: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Example: Systematic πps with n = 2

ID πi CL CU

1 0.4 0 0.42 0.2 0.4 0.63 0.5 0.6 1.14 0.5 1.1 1.65 0.4 1.6 2.0

A = {1, 3} for 0 < R ≤ 0.1

= {1, 4} for 0.1 < R ≤ 0.4

= {2, 4} for 0.4 < R ≤ 0.6

= {3, 5} for 0.6 < R ≤ 1

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 31 / 32

Page 32: Chapter 4: Element sampling design: Part 2jkim.public.iastate.edu/teaching/chapter4.pdfIntroduction 1 Introduction 2 Poisson sampling 3 PPS sampling 4 ˇps sampling Kim Ch. 4: Element

πps sampling

Systematic πps sampling

First order inclusion probability: let l be the integer satisfyingl · a ≤ CLk < CUk ≤ (l + 1)a.

Pr (k ∈ A) = Pr {CLk < R + l · a ≤ CUk}

=

∫ CUk−l ·a

CLk−l ·a

1

adt =

nxk∑k∈U xk

.

Easy to compute. Very popular

πij = 0 for some i , j .

Efficiency depends on the sorting variable.

Kim Ch. 4: Element sampling design: Part 2 Fall, 2014 32 / 32