Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.

Click here to load reader

download Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.

of 45

description

Bound on the error of estimation: Note that we don’t need to know τ x or N to estimate µ y when using the ratio procedure; however we must know µ x: 3

Transcript of Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.

Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad Estimators Ratio estimator of a population mean y : Estimated variance of 2 Bound on the error of estimation: Note that we dont need to know x or N to estimate y when using the ratio procedure; however we must know x: 3 Association To remember the formulas for ratio estimation of a population mean, total, or ratio, we make the following association. The sample ratio r is given by the formula The estimator of R, y, y are then 4 Thus we need to know only the formula for r and its relation to And Approximate variances can be obtained if you remember the basic formula 5 6 Selecting the Sample Size We stated previously that the amount of information contained in the sample depends on the variation in the data (which is frequently controlled by the sample survey design) and the number of observations n included in the sample. Once the sampling procedure (design) has been chosen, the investigator must determine the number of elements to be drawn. We will consider the sample size required to estimate a population parameter R, y, or y to within B units for simple random sampling using ratio estimators. 7 Note that the procedure of choosing the sample size n is identical to that presented in unit 2. The number of observations required to estimate R, a population ratio, with a bound on the error of estimation of magnitude B is determine by setting two standard deviations of the ratio estimator r equal to B and solving this expression for n. That is, we must solve 8 For n. Although we have not discussed the form of V(r), you recall that, the estimated variance of r, is given by the formula 9 Equation 3.19 can be rewritten as In this instance we define 10 An approximate population variance, V(r), can be obtained from by replacing s2 with the corresponding population variance 2. Thus the number of observation required to estimate R with a bound B on the error of estimation is determined by solving the following equation for n: 11 Sample size required to estimate R with a bound on the error of estimation B (3.22) 12 In a practical situation we are faced with a problem in determining the appropriate sample because we dont not know 2. If no past information is available to calculate s 2 as an estimate of 2, we take a preliminary sample of size n and compute 13 Then we substitute this quantity of 2 in equation 3.22, and we find an approximate sample size. If x is also unknown, it can be replaced by the sample mean, calculated from the n preliminary observations. 14 Similarly we can determine the number of observations n needed to estimation a population mean y, with a bound on the error of estimation of magnitude B. The required sample size is found by solving the following equation for N: Stated differently, 15 Sample Size to estimate y with a bound on the error of estimation B (3.24) 16 Note that we need not to know the value of x, to determine n in equation 3.24; however we do need an estimate of 2, either from prior information if it is available or from information obtained in a preliminary study. 17 The sample size required to estimate y with a bound on the error of estimation of magnitude B can be found by solving the following expression for n: Or equivalently, from equation Sample size required to estimate y with a bound on the error of estimation B: (3.26) 19 When to use ratio estimation Use of ratio estimator is most effective when the relationship between the response y and a subsidiary variable x is linear through the origin and the variance of y is proportional to x. To understand above point, let us take an example 20 Understanding example An automobile tire distributor wishes to estimate the average cash receipts for his 1570 stores (N= 1570) during a particular sales period. From a simple random sample of n=50 stores, the corresponding cash receipts y i (i=1,2,3,,50) are observed. One possible estimator of y, the average cash receipts for the company, is, the sample mean. 21 In addition to obtaining cash receipts y i, suppose the distributor can obtain xi (i=1,2,, 50). The number of customers who made purchases in store I during the sales period. To determine the relationship between y and x, he can plot the sales and customer data for the n=50 samples stores. 22 If the plot is similar to the one presented in next slide, we can assume that the cash receipts y are linearly related to the number of customers purchasing good, x. in fact, we could depict this relationship with a straight line passing through the intersection of the x and y axes, and hence we can say it is linear through the origin. In addition, you will note from figure that the scatter of y values widens as x increases. Hence we can say that the variance of y is proportional to x. under these conditions the ratio estimator of y, the average amount of cash receipts per store, should have a smaller variance and, hence, be more precise than. 23 Figure of positive scatter plot 24 Sometimes, a plot of y versus x does not clearly indicate that ratio estimation should be used. The strength of the correlation between y and x is another good indicator of effectiveness of the ratio estimator. For >1/2, the ratio estimator should provide a more precise estimate of y or y than would or. 25 Unlike the estimation procedure discussed previously, ratio estimation usually leads to biased estimators. Thus we must consider the magnitude of the bias to decide which estimation procedure to use. Although there are no exact formulas to determine the bias of these estimators, it can be shown that the absolute value of the bias is less than or equal to the product of the standard deviation of the sample mean of subsidiary variable x and the standard deviation of the ratio estimator, all divided by x. That is 26 Where can be the ratio estimator r,, or, and is the corresponding parameter estimated. If estimates of,, and are known from prior experimentation, we can estimate maximum bias for a given physical situation by using equation (3.27). 27 Generally, for large sample size (n>30) and for ( )0.10, the bias is negligible. Note also that ration estimators are unbiased when the relationship between y and x is linear through the origin. Finally, we must consider the cost of obtaining information on the subsidiary variable x. if the physical situation suggests the use of ratio estimation, the experimenter must decide whether the increased precision of the ratio estimator justifies the additional cost. 28 Ratio Estimation in Stratified Random Sampling For the same reasons indicated in previous unit, stratifying the population before using a ratio estimator is sometimes advantageous. We will assume that we can take a large enough sample of both xs and ys in each stratum for the variance approximations to work fairly well. There are two different methods for constructing estimators of a ratio in stratified sampling. One is to estimate the ratio of x to y within each stratum and then form a weighted average of these separate estimates as a single estimate of the population ratio. The result of this procedure is called separate ratio estimator 29 The other method involves first estimating y by the usual and similarly estimating x by. Then ( ) can be used as an estimator of ( ). This estimator is called a combined ratio estimator. We will not introduce a general (and cumbersome) notation for these estimators but will illustrate their use by a numerical example. 30 Example 3.7 Refer to example 3.4. Treat the 10 observations given there on man-hours lost due to sickness as a simple random sample from company A. Thus A simple random sample of n B =10 measurements was taken from company B within the same industry. 31 (Assume companies A and B together form the population of workers of interest in this problem). The data are given in the accompanying table. It is known that NB=1500 employees and xB =12,800. Find the separate ratio estimate of y and its estimated variance 32 Table for data 33 EmployeeMan-hour lost in previous year, xMan-hours lost in current year, y Totals7846 Solution 34 Solution 35 Example 3.8 Refer to the data of example 3.7 and find a combined ration estimate of y. 36 Solution 37 Solution 38 Comparison between 3.7 and 3.8 On comparing 3.7 and 3.8, we see that the combined ration estimator gives the larger estiamted variance. This result is generally the case, and so we should emply the separate ratio estimator most of the time. However, the separate ratio estimator may have a larger bias since each stratum ratio estimate contributes to that bias. In summary, if the stratum sample sizes are large enough (say 20 or so) so that the separate ratios do not have large biases and so that the variance approximations work adequately, then use the separate ratio estimator. 39 If stratum sample sizes are very small, or if the within-stratum ratios are all approximately equal, then the combined ratio estimator may perform better. Of course, an estimator of the population total can be found by multiplying either of the estimators above by the population size N, and the variances can be adjusted accordingly. Thus we might use the notation 40 Regression Estimation We observed that the ratio estimator is most appropriate when the relationship between y and x is linear through the origin. If there is evidence of a linear relationship between the observed ys and xs, but not necessarily one that would pass through the origin, then this extra information provided by the auxiliary variable x may be taken into account through a regression estimator of the mean y. 41 One must still have knowledge of x before the estimator can be employed, as it was in the case of ratio estimation of y. The underlying line that shows the basic relationship between ys and xs is sometimes referred to as the regression line of y upon x. Thus the subscript L in the ensuing formulas is used to denote linear regression. 42 The estimator given in next section assumes the xs to be fixed in advance and the ys to be random variable. We can think of the x values as something that has already been observed, like last years first quarter earnings, and the y response as a random variable yet to be observed, such as the current quarterly earnings of a company for which x is already known. The probabilistic properties of the estimator then depend only on y for a given set of xs. 43 Estimators Regression estimator of the population mean y. (3.28) Estimated Variance of (3.29) 44 Estimator Bound of the error of estimation: (3.30) When calculating b from observed pairs (y 1,x 1 ),,(y n, x n ), we may use the fact that 45