Probability proportional to size (πps) sampling using ranks

17
This article was downloaded by: [North Carolina State University] On: 24 September 2013, At: 15:48 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 Probability proportional to size (πps) sampling using ranks Tommy Wright a a Mathematical Sciences Section Engineering Physics and Mathematics Division , Oak Ridge National Laboratory , P. O. Box 2009, Bldg. 9207A, Oak Ridge , Tennessee , 37831-8083 , U.S.A Published online: 27 Jun 2007. To cite this article: Tommy Wright (1990) Probability proportional to size (πps) sampling using ranks, Communications in Statistics - Theory and Methods, 19:1, 347-362, DOI: 10.1080/03610929008830205 To link to this article: http://dx.doi.org/10.1080/03610929008830205 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transcript of Probability proportional to size (πps) sampling using ranks

Page 1: Probability proportional to size (πps) sampling using ranks

This article was downloaded by: [North Carolina State University]On: 24 September 2013, At: 15:48Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

Probability proportional to size (πps) sampling usingranksTommy Wright aa Mathematical Sciences Section Engineering Physics and Mathematics Division , Oak RidgeNational Laboratory , P. O. Box 2009, Bldg. 9207A, Oak Ridge , Tennessee , 37831-8083 ,U.S.APublished online: 27 Jun 2007.

To cite this article: Tommy Wright (1990) Probability proportional to size (πps) sampling using ranks, Communications inStatistics - Theory and Methods, 19:1, 347-362, DOI: 10.1080/03610929008830205

To link to this article: http://dx.doi.org/10.1080/03610929008830205

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shall not beliable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilitieswhatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out ofthe use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Probability proportional to size (πps) sampling using ranks

COMMUN. STATIST. -THEORY METH., 19(1), 347-362 (1990)

PROBABILITY PROPORTIONAL TO SIZE (xps) SAMPLING USING RANKS

Tommy Wright

Mathematical Sciences Section Engineering Physics and Mathematics Division

Oak Ridge National Laboratory P. 0. Box 2009, Bldg. 9207A

Oak Ridge, Tennessee 3783 1-8083 U S A

Key Words and Phrases: Horvitz - Thompson Estimator; scps sampling; ranks; variance estimation.

ABSTRACT

There can be gains in estimation efficiency over ~ q u d probability sampling

methods when one makes use of auxiliary information for probability

proportional to size with replacement (xpswr) sampling methods. The usual

method is simple to execute, but might lead to more than one appearance in the

sample for any particular unit. When a suitable variable x is not available, one

may know how to rank units reasonably well relative to the unknown y values

before sample selection. When such ranking is possible, we i n d u c e a simple

and efficient sampling plan using the ranks as the unknown x measures of size.

The proposed sampling plan is similar to, has the simplicity of, and has no greater

sampling variance than with replacement sampling, but is without replacement.

Copyright O I990 by Marcel Dekker, Inc. Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 3: Probability proportional to size (πps) sampling using ranks

1. INTRODUCTION

WRIGHT

Sampling from a finite population of N units with unequal probabilities is

often desirable when units vary greatly in size. Assume that a vector o J i , x i ) of

nonnegative components is associated with Ui the i th unit for i = l , . . . , N . If

the x ' s are known and the y 's are unknown, assume sampling of fixed size n N

(c N) to estimate the population total Y = C yi . When x i , the probability of i = l

N sample inclusion for U i , is ni = npi, where pi =xi 1.Z xi , the plan is called

1=1

probability proportional to size ( ~ p s ) sampling. Hansen and Hurwitz (1943) demonstrated that more efficient estimators of Y

often result using xps sampling than using equal probability sampling. Though

their method was easy to execute and led to unbiased estimators and unbiased

estimators of sampling variance, it was with replacement. More recently, Scott

and Smith (1975) have given conditions under which xps sampling (with

replacement) is minimax.

Proposed without replacement xps sampling (npswor) plans limit each

population unit to at most one appearance in a sample and can lead to more

efficient estimators than xps sampling with replacement (npswr). Brewer and

Hanif (1983) compare several xpswor procedures, noting that many of them

have limitations, can be complicated to execute if n > 2, and make the

computation of joint probabilities of sample inclusion difficult (also see Cassel,

S h d a l , and Wretman (1977), pp. 16-17). This has led researchers to search for

efficient npswor plans that are simple to execute and explain for n greater than 2. This paper starts with xpswr sampling and alters it slightly to avoid the chance of

more than one appearance in the sample of any given unit while maintaining

simplicity and efficiency over equal probability sampling methods as well as the

usual lrpswr method. The next three paragraphs provide background motivation

for the xpswor method proposed in this paper.

Perhaps the chief advantage of xpswr sampling is its simplicity and its major

disadvantage is that any unit can be repeated in the sample. xpswr sampling is

simple because one makes n independent selections from the same population in

the sense that the population conditions for the ith selection for the sample are the

same as for the ( i+l)st selection for i=1,2, . . . , n. Is it possible to mimic lrpswr

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 4: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 349

sampling and not have repeated selection(s) of any particular unit for the sample?

Is it possible to make n independent selections from the same population without

replacement? One way to approximately achieve this is to seek a method of

stratifying the population into n strata where each stratum will tend to have

similar characteristics (e.g., equal means) as the population and select one unit

independently from each stratum. One method which comes close to

accomplishing this is the Rao-Hartley-Cochran (1962) method which creates n

strata by random assignment followed by independent selection of one unit from

each stratum. However, the Rao-Hartley-Cochran method is not xps sampling.

Occasions exist where one will know how to rank units in the population

reasonably well relative to the unknown y values and other prior information

before sample selection. Because x provides a "measure of size" for each unit

relative to y , why not take the ranks as the variable x when a suitable x that is

highly correlated with y is unknown? In what follows, we assume that such

ranking is possible and that the ranks will be the basis for the stratification into n

similar strata.

Before proceeding, it is instructive to cite an example where the ranks of the

unknown y -values would likely be known for the entire population. There are

160 distributors of the electricity generated by the Tennessee Valley Authority

(TVA) for more than 2.5 million residential customers over an eight state region.

As part of its Load Research Program, TVA wants to estimate its total demand

from these residential customers (as well as other categories of customers) for

given time periods. For administrative convenience, multi-stage cluster sampling

plans that treat each distributor as the primary sampling unit seem most

appropriate for attaining timely estimates of demand. The fist stage of such a

sampling plan would generally call for the selection of a sample of dismbutors

using Icpswor. The relative total customer demand by distributor is fairly

consrant h m month to month and is known. Hence in this case, if yi is ilie iuial

demand of the residential customers of the i th distributor for a given time period,

its rank in the list for N = 160 distributors would be known with a reasonable

confidence assuming fairly uniform demand per distributor over a month.

(Similar examples can be given for other energy related populations that are

sampled periodically to estimate national, regional, and state level totals.)

Intuition suggests that the use of ranks can be beneficial. Advantages for the

use of ranks are (i) that qualitative data (perhaps subjectively derived) can be

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 5: Probability proportional to size (πps) sampling using ranks

3 50 WRIGHT

used in producing the ranks, (ii) that the method is likely to be robust relative to

outliers in the sizing variables and (iii) that ranks are likely to be more stable over

time and reflect more accurately relative sizes of the population units than the

usual type auxiliary variable x . In this paper, a new, simple, and efficient method of sample selection is

proposed for xpswor sampling based explicitly on ranks. Using the unbiased

Horvitz-Thompson (1952) estimator of Y , we show that the sampling error of the

proposed method is no greater than that under xpswr and that it is optimal when

y and the ranks are directly proportional. We consider the usual estimator of

variance under sampling with replacement, obtain an expression for its bias, and

note a way to decrease this bias. For selected populations, numerical

comparisons are made between the proposed method and the simple xpswr

method of Hansen and Hurwitz (1943). Comparisons with the Rao-Hartley-

Cochran method are also provided.

2. THE SAMPLING SCHEME AND ESTIMATION

2.1 STRATA CONSTRUCTION

Without loss of generality, let the rank of Ui be Ri=i , where the ordering is

from smallest to largest with respect to the unknown yi 's. Then pi = 2iIN (N+l).

Assume that N = 2kn where k is a positive integer and n is the desired sample

size. We consider two different stratification plans. Others are possible.

Stratification Plan A

Partition the N units into n strata by assigning to the j t h stratum

(j = 1,2,, . . . , n)the2k units

Uti- l)k+i , Uti- l )k+2, ' ' ' , lJ jk , lJrN-jk+l, lIN-jk+Z, ' ' ' t u , ~ - j ~ ; ~ . (2.1)

Stratification Plan B

Partition the N units into n strata by assigning to the j t h stratum

0' = 1.2, . . . , n ) the 2k units

For each stratification plan the sum of the ranks in each stratum is

N (N+1)/2n. This fact, that the sum of the ranks assigned to each stratum under

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 6: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 351

either plan is the same, is the key motivation for the stratification which makes it

possible to define the simple sampling plan in Section 2.2 that is clearly a xpswor sampling plan. Each stratification plan tends to produce strata with equal

means--especially when the y-distribution is symmetric. Figure 1 gives the

general picture by showing which units are assigned to each stratum under each

stratification plan.

2.2 THE SELECTION SCHEME FOR EITHER STRATIFICATION PLAN

Let ij range over the ranks assigned to the jth stratum and select the ijth unit

from stratum j with probability npij, where pi, = 2i,lN (N+l) for j = 1,2, . . . , n . The selections among strata are independent. The sum of the probabilities in the

jth stratum is Fnpij = 1. Because the probability of inclusion in the sample far Ui ' j

is xi = npi , the sample selection procedure is xpswor sampling.

If xi, is the joint probability that Ui and U, are included in the sample where

i ;t j , then xi, = nix j if Ui and U, are in different strata, and xij = 0 otherwise.

2.3 ESTIMATION OF Y FOR EITHER STRATIFICATION PLAN

To estimate Y = Z y i , we use the unbiased Horvitz-Thompson (1952) i= l

estimator

where is the y -value of the one unit selected from the jth stratum and nj is its

probability of inclusion.

Lemma 1. Let piY be the true mean of the 2k units assigned to the jth stratum.

Then

Proof. This result follows directly from (2.5) below by noting that the sampling

design is a stratified xpswr sampling plan with one sampled unit per stratum.

From Theorem 9A.1 of Cochran (1977), the usual unbiased estimator of Y and

its variance under npswr are

" Yi YmR=L I: - yi2 y2 and V a r ( Y m ) = I : - - - -

(2.5) n i=1 pi i = l X i n

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 7: Probability proportional to size (πps) sampling using ranks

WRIGHT

(i) PLAN A

Stratum

Stratum

1

2

3

. . I . . .

n-1

n

Figure 1. Stratification Plans A and B of the N Ranked Units into n Strata, Each Stratum has Size 2k Units

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 8: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 353

where xi = npi. Lemma 2 demonstrates that Var (YmoR) never exceeds

Var (YmR 1.

kmma 2. Var (YmoR ) S Var (YmR )

Proof. Let b = YIN. From (2.4) and (2.5), the result is immediate because

N~ Var(YmR)-Var(YmoR)=T

n

A model under which kmR is optimal having zero sampling variance is

given in Lemma 3.

Lemma 3. Let yi = pi , v i . Then every possible value of YmoR is Y . Hence,

Var (YmoR ) = 0. Proof. First note that if yi = p i v i , then Y = pi = N(N+l)pI2. The result follows

i=l

by observing that for each possible sample, YmOR is also N(N+l)p12.

2.4 ESTIMATION OF ~ a r ( ~ ~ , , ~ ) FOR EITHER STRATIFICATION PLAN

One unbiased estimator of the variance of the Horvitz-Thompson estimator by

Horvitz and Thompson (1952) and another by Sen (1953) and Yates and Grundy

(1953) are referenced widely. These estimators require that all aij be nonzero and

c a m t be cofisiderd fer t ! e pmpsed method because some xii = 0. .. A

We consider estimation of Var (YmoR) by Var(YmR), the usual unbiased esti-

mator of sampling variance under sampling with replacement, which is given by

The following lemma shows that va;(kmR) is a non-negatively biased estimator. .. .. - .

Lemma 4. If Var(YmR) is used to estimate Var(YmOR). then A A N~ "

Bias (Var(YmR )) = - (n-1)n j = ~ Z ( F ; ~ - b j2

Proof. From Thecrem 2.4.6 of Wolter (1985) and the proof of Lemma 2, the

result follows by . ,. n

N 2 t ( p j , . - y P . Bias (Var(YmR )) = - [Var (YmR ) - Var ( Y m o ~ )] = - n-1 (n-1)n j-I

Under the model in Lemma 3, the Bias ( ~ & ( k j ) ) = 0.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 9: Probability proportional to size (πps) sampling using ranks

3 54 WRIGHT

Lemma 5. A less non-negatively biased estimator of var(kmOR) is given by .. a

v;$ ( t j O R ) = Q V ~ T ( Y ~ ~ ~ ) where (n -l)/(n t l ) < a < 1 and

4 = Var ( ~ W R ) IV~~(YHWOR ).

Proof. The result follows algebraically by finding a to satisfy

~ a r (kmoR) < c r ~ (va;(YmR)).

3. NUMERICAL COMPARISONS

In this section, we give numerical comparisons of the proposed method with the simple xpswr sampling method and the Rao-Hartley-Cochran sampling method.

3.1 THE DATA AND DESCRIlTION OF POPULATIONS

Data from five populations with y and x values that have appeared in the

literature and that will be the basis for our numerical comparisons are given in

Table I below taken from Wright (1987). Plots of these data are given in Fig- ure 2 (Column 1). In populations 1, 2, 3, and 4 the data pair with the largest y

value was omitted from the original set for all methods to meet the requirement N = 2kn. In population 5, the two data pairs with the two largest y values were omitted from the original set to meet this requirement (see Section 4). The corresponding plots of the y -ranks against the y values are given in Figure 2

(Column 2) while the plots of the x-ranks against the y values are given in Fig-

ure 2 (Column 3). The N's considered in these examples an realistic in practice when one views each population as a stratum. Also the units could be the pri- mary sampling units in a multi-stage cluster sampling plan.

It is indeed clear from the graphs (except for population 1) that the y -ranks are mon linearly related to the y -values than are the x-values. Undeniably, the ranks tend to straighren things out. Hence, we have empirical evidence that the model in Lemma 3 is a reasonable one, especially when there are no extreme y - values. Note that there are no extreme y -values in populations 2-5.

3.2 VARIANCE AND BIAS COMPARISONS WITH ~ar*-oR)

Table II compares variances of the two methods using the y -ranks (Comparis- ons using x -ranks appear in parentheses).

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 10: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 355

TABLE I. Data (y lx) for Five Populations

Population 1. Source: Cochran (1977). p. 152. (N =48). Population sizes of 48 large United States cities (in 1000's) in 1930 (yi) and 1920 (xi).

Population 2. Source: Yates (lW9), p. 159. (N=42). Population sizes (xi) of 42 kraals in the Mondora Reserve in Southern Rhodesia and the number of persons absent ( y i ) from these kraals.

Population 3. Source: Yaks (1949). p. 163. (N =%). Measured volumes of timber ( y i ) on 24 sample plots and eye estimates (x i ) of corresponding stands (cu. ft. per 1/10 acre).

Population 4. Source: Sukhatme and Sukhatme (1970). p. 51. (N=24). Values of total cultivated area (xi) and area under rice ( y i ) for 24 villages in Baloda Bazar Tehsil (in acres).

Population 5. Source: Sukhame and Sukhatme (1970). p. 185 (N=32). Values of area under wheat f a 1936 (xi) and 1937 Oi) for 32 villages in Lucknow Subdivision (India) (in acres).

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 11: Probability proportional to size (πps) sampling using ranks

3 56

Population I . (Source: G x h . 1977. p. 158. N = 48)

WRIGHT

... ... I t ' i.

I 2 x 0 25

Y -- 1-nntr 5. (Sam: S-ad SuUuaa. 1970, p 18S.N- 32)

Figure 2. Plots of the Data for Five Populations

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 12: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 357

TABLE 11. Variance Comparisons Based on the y -Ranks (Based on the *-Ranks)

Pop. N n

var(Yh'OR,B) ~ar(~;mvOR.A) var(Y;mvOR,B)

From column (1) of the table, in most cases, the method of this paper using

Plan A has smaller sampling variance than using Plan B. When the reverse is

true, the difference is not great.

From columns (2) and (3) in the table, the method of this paper using Plan B generally has sampling variance closer to Rpswr than does the method of this

paper using Plan A. In this sense, we say that the method of this paper using

Plan B is near npswr, but without replacement. Thus the method of this paper

has a near unbiased estimator of variance in most cases under Plan B. Note also

that for populations 2 and 3, in most cases there is very little difference between

the methods because in both populations the points are nearly uniformly scattered

on the graphs with respect to the y 's, and the method of this paper (particularly

Plan B) rather successfully creates strata that are quite similar to the overall popu-

lation. This implies that to get even greater gains in efficiency with the method of

this paper, one should search for stratification plans that create strata that differ

greatly from each other as well as from the overall population, but yet have equal

total ranks for each stratum. This is also clear from the proof of Lemma 2.

Ranks vs. Other Auxilialy Variables: Is there an advantage in using the ranks

as measures of size over other possible auxiliary variables? Table 111 gives corn-

parisons of variances using the y -ranks (Vary-,*(.)) with variances using the x

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 13: Probability proportional to size (πps) sampling using ranks

WRIGHT

TABLE m. Comparisons of y -Ranks with Other Possible x Auxiliary Variables

(1) (2)

TABLE IV. Relative Bias of Estimator of Sampling Variance for The Propostd Method (Plans A and B)

Usina the Ranks of the Y'S'

Population N n

The numbers in parentheses are based on the ranks of X 's.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 14: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 359

values (Vat-,(.)) for the smallest sample sizes considered in Table I1 for each

population. From column (1 ) of Table 111, in four out of the five comparisons,

sampling with replacement using the y -ranks is more efficient than sampling with

replacement using the x-variables given in the original sources. Note that for

population 1, pXvy > py,h,y. The opposite inequality is true for the remaining

four populations. However in every case, the method of this paper using y -ranks

is more efficient than sampling with replacement using the given x -variable in hie

original sources from column (2).

As shown in Table IV for the method of this paper, when the bias of the estimator of sampling variance is large, the sampling variance is small, and conversely.

Comparison with the Rao-Hartley-Cochran Method (1962) Using Ranks

Sampling with probability proportional to size is most easy and simple when

n = 1. Clearly in this case there is no difference between xpswr and xppswor sampling. As noted earlier, a motivating reason for the proposed method of this

paper was to stratify the population into n strata each having equal total ranks so

that one independent selection could be made from each stratum. This simplicity

is a chief advantage in terms of execution and is a feature that is shared by the

Rao-Hartley-Cochran Method (1962) which is briefly described as follows.

Sampling Scheme. Divide the population of N units at random into n groups of Nln units each. Independently select one unit from each group

with probabilities proportional to the normed measures of size within the

Estimator of Y . Let yji be the sample value if the i t h unit of the jth group

is selected with normed rank relative to the ranks in that group denoted by

Pji .

Variance of Estimator. When each group has Nln units, one can show

(See, e.g., Cochran 1977) that N-n Var (YRHC ) = - Var (YmR ) . N-1

Note that both the method of this paper and the Rao-Hartley-Cochran (RHC) Method have sampling variances that never exceed that of xpswr sampling. Table V gives some numerical comparisons. Because stratification plan A was

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 15: Probability proportional to size (πps) sampling using ranks

360 WRIGHT

TABLE V. Variance Comparisons Based on the y -Ranks (Based on the x -Ranks)

Pop.

observed to be more precise than stratification plan B for most of the cases con-

sidered in Table 11, we only consider plan A for comparison with the RHC

method. For populations 3 and 4, the RHC method is slightly more efficient than

the method of this paper. The table provides evidence that the proposed method

is a strong alternative (competitor) to the RHC method. It should be noted that the

RHC method is r,ot exact rcpswor sampling, but the RHC method does have an

unbiased estimator of sampling variance.

4. DISCUSSION

The method of this paper requires that N = 2kn. If this equality does not hold

initially, a compromise strategy is to tt&e enough, say c , of the largest units with

certainty so that N -c = 2k (n -c) . Thus we would select the remaining n -c units

for the sample from the remaining N-c units with rcpswor sampling as described.

For example, if N = 49 and n = 5, then c = 1. Clearly this strategy and other

variations will not always be appropriate, and further research is needed for the

case N + 2kn. Let

SSB, which is a measure of variability among the n strata, occurs in two different

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 16: Probability proportional to size (πps) sampling using ranks

PROBABILITY SAMPLING METHODS 361

settings--first, as the difference between Var (YmR ) and Var (YmoR ) and a

secondly, as the (approximate) bias of Var(YmR) as an estimator of

~ a r ( f ~ ~ ~ ) . In the first case, we want SSB to be large (use Plan A) if we are

arguing that sampling without replacement is more efficient than with replace-

ment; in the latter case, we want SSB to be small (use Plan B). Lemma 5 offers a

compromise. In practice, we may prefer Plan B over A because to know that we

can do no worse using sampling without replacement than sampling with replace-

ment might be acceptable if we know that we also get the advantage of at most

one inclusion in the sample for each population unit, that the wor method is sim-

ple to execute, and that a near unbiased variance estimator is possible.

Does the size of N make a difference between sampling with and without

replacement as considered in this paper? As noted above, one way of considering

the difference between sampling with replacement and sampling without replace-

ment is to consider the difference in the sampling variances which is the quantity

SSB . SSB is a function of N l n , the reciprocal of the sampling fraction, and

2 $ (ph - 4 12. For constant 5 (ph - py ) . as N increases relative to n . there is j=1 j=1

an increase in the difference SSB. This is in contrast to what happens for large N

relative to n under equal probability sampling with and without replacement. In

general, the behavior of SSB for increasing N relative to n is unclear. On the

other hand, under the model of Lemma 3, SSB = 0 for all values of N and n . The proposed method is one example where we prefer heterogeneous strata

rather than homogeneous ones.

Finally, the use of ranks is common in classical nonparamemc statistical

methodology. Considering the nonparamemc-like nature of sampling theory for

a tinite population, one wonders why the use of the ranks isn't more common in

sampling theory and methodology. We have shown that ranks can be quite useful

in rips sampling. Indeed, one may actually have more belief in the accuracy of

the ranks as measures of size than the "traditional" x auxiliary variables. There

is potential for greater explicit use of ranks in sampling techniques wherever aux-

iliary information is used.

ACKNOWLEDGMENTS

The author thanks William Winkler and Keith Rust for encouraging discus-

sions on this work and two referees for comments, all of which led to a substan-

tial improvement in the presentation. Research sponsored by the Applied

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013

Page 17: Probability proportional to size (πps) sampling using ranks

362 WRIGHT

Mathematical Sciences Program, Office of Energy Research, U.S. Department of

Energy, under contract number DE-AC05840R21400 with Martin Marietta

Energy Systems, Inc.

BIBLIOGRAPHY

Brewer, K. R. W. and Hanif, M. (1983), Sampling With Unequal Probabilities, Lecture Notes in Statistics Series, 15, Springer-Verlag. New York, New York.

Cassel, C.-M., S b d a l , C.-E., and Wretman, J. H. (1977), Foundations of Infer- ence in Survey Sampling, John Wiley and Sons, New York, New York.

Cochran, W. G. (1977), Sampling Techniques, 3rd Edition, John Wiley and Sons, New York, New York.

Hansen, M. H. and Hwwitz, W. N. (1943), "On the Theory of Sampling From a Finite Population," Annals of Mathematical Statistics, 14, pp. 333-362.

Horvitz, D. G. and Thompson, D. J. (1952), "A Generalization of Sampling Without Replacement From a Finite Universe," Journal of the American Statistical Association, 47, pp. 663-685.

Rao, J. N. K., Hartley, H. O., and Cochran, W. G. (1962), "On a Simple Pro- cedure of Unequal Probability Sampling Without Replacement," Journal of the Royal Statistical Society, Series B , 24, pp. 482-491.

Scott, A. J. and Smith, T. M. F. (1975). "Minimax Designs for Sample Surveys," Biometrika, 62, No. 2, pp 353-357.

Sen, A. R. (1953). "On the Estimate of the Variance in Sampling With Varying Probabilities," Journal of the Indian Society of Agricultural Statistics, 5 , pp. 119-127.

Sukhatme, P. V. and Sukhatme, B. V. (1970), Sampling Theory of Surveys With Applications, Iowa State University Press, Ames, Iowa.

Wolter, K. M. (1985), Introduction to Variance Estimation, Springer-Verlag, New York, New York.

Wright, T. (1987), "A Simple Method for Probability Proportional to Size (scps) Sampling Without Replacement Based on Ranks", ORNLA'M-10398, Oak Ridge National Laboratory, Oak Ridge, Tennessee.

Yaks, F. (1949), Sampling Methods for Censuses and Surveys, Hafner Publishing Company, Inc., New York, New York.

Yates. F. and Grundy, P. M. (1953), "Selection Without Replacement From Within Strata With Probability Proportional to Size," Journal of the Royal Statistical Society, Series B, 15, pp. 253-261.

Received Mahch 19 t i 9 ; RevAed Decembeh 1 9 8 9 .

Recommended by E . G. S c W n g , Rochateh I n ~ t L t u R e 0 6 Tech. Rochatetr, NY.

Dow

nloa

ded

by [

Nor

th C

arol

ina

Stat

e U

nive

rsity

] at

15:

48 2

4 Se

ptem

ber

2013