COMPARISON OF POPULATION PROPORTIONS

COMPARISON OF POPULATION PROPORTIONSGiven X1~BIN(n1, p1) and X2~BIN(n2, p2)Statistics :

Are defined to be the sample proportions.

Assume, that X1 and X2 are independent;

2

22

1

11 ˆ;ˆ

nXp

nXp

)ˆ()ˆ()ˆˆ( 2121 pEpEppE

21 pp

)ˆ()ˆ()ˆˆ( 2121 pVarpVarppVar

2

22

1

11 )1()1(n

ppn

pp

For sufficiently large n1 and n2 the standardized statistic :

The (1-α)100% CI :

As p1 and p2 UNKNOWN, approximate (1-α)100% CIfor (p1-p2) :

22

22

1

1121

)1()1()ˆˆ( znpp

npppp

2

22

1

11

2121

)1()1()()ˆˆ(

npp

npp

pppp

2

22

1

11221

)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(

npp

nppzpp

In the testing situation,Ho : p1 = p2 = p ( p unknown )

Versus

Test statistic :

The unknown common value of p is estimated by :

1H

21 pp 21 pp 21 pp

zZRR : zZRR :2: zZRR

testlos

21

21

)1()1(ˆˆ

npp

npp

ppZ

21

21ˆnnXXp

EXAMPLEMembers of the Department of statistics at Iowa State Union collected the following data on grades in an introductory business statistics course and an introductory engineering statistics course.

Course #Students #A gradesB.Stat 571 82E.Stat 156 25 Ho : p1=p2Vs H1 : p1≠p2

; The proportion of A grades in two courses is equal.

1436,057182ˆ1 p 1603,0

15625ˆ 2 p

1472,01565712582ˆ

p

)1561

5711)(8528,0(1472,0

1603,01436,0

Z

52,0Z

The p-value is 2P(Z≤-0,52) = 0,6030 If α= 5% < p-value

Ho would not be rejected

Proportion of A’s does not differ significantly in the two courses.

An insurance company is thinking about offering discount on its life insurance policies to non smokers. As part of its analysis, it randomly select 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever suffered from heart diseases. The results indicate that 20 out of 80 smokers and 15 out of 120 non smokers suffer from heart disease. Can we conclude at the 5% los that smokers have a higher incidence of heart disease than non smokers ? DATA

berumur 50th perokok menderita penyakit JANTUNG parameter : p1

berumur 50th bukan perokok menderita penyakit JANTUNG parameter : p2

EXERCISE

Solution:

Jelas Data Qualitative vs

Test statistic :

ztabSample proportion : ;

Pooled proportion estimate :Value of the test statistic:

)11(ˆˆ

)ˆˆ(

21

21

nnqp

ppz

.645,1: 05,0 zzzRR

25,08020ˆ1 p 125,0

12015ˆ 2 p

175,020035

120801520ˆ

p

hitcal zz ˆ ˆ

ˆˆ1 2

1 2

p -p (0,25-0,125)z= =1 1 1 1pq( + ) 0,175(0,825)( + )n n 80 120

0: 21 ppHo 0: 211 ppH

tabcal zz 28,2 oHreject

Test statistic, is normally distributed

We can calculate p-value

p-value = Reject Ho

%13,10113,0)28,2( zP

SOAL-SOAL

1. Diberikan pmf dari variabel random X sbb: x 0 1 2 3 p(x) 0 k k 3k2

Tentukan k sehingga memenuhi sifat dari pmf!

xxp 0)(

130)( 2 kkkxp0123 2 kk

1,310)1)(13( kkkk

1)(xp

Solusi: Ada dua sifat pmf, yaitu :

Untuk

Dengan demikian tidak memenuhi. Selanjutnya untuk dapat diperiksa ternyata pada kondisi ini memenuhi sifat pmf.Jadi nilai

01)1(1 pk01)2( p

1k31

k

31

k

In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax.(a) Can we conclude at the 5% los that the

proportion of voters favoring a sales tax decrease differs between high and low-income voters?

(b) What is the p-value of this test?(c) Estimate the difference in proportions, with 99%

confidence! 0)(: 21 ppHo

0)(: 211 ppH

96,1: zRR

)11(ˆˆ

)ˆˆ(

21

21

nnqp

ppz

Solution:

vs

Test statistic :

53,07540ˆ;6,0

10060ˆ 21 pp

571,0175100

751004060ˆ

p

429,0ˆ1ˆ pq

93,0)

751

1001)(429,0(571,0

)53,060,0(

calz

0-1,96 1,96

(a) Conclusion : don not reject Ho(b) p-value = 2P(z > 0,93) = 2(0,1762) = 0,3524.(c)

The difference between the two-proportions is estimated to lie between -0,125 and 0,265

2

22

1

11221

ˆˆˆˆ)ˆˆ(

nqp

nqpzpp 75

)47,0)(53,0(100

)4,0)(6,0(575,2)53,060,0(

195,007,0

TEST on MEANS WHEN THE OBSERVATIONS ARE PAIRED

TESTING THE PAIRED DIFFERENCES

Let (X1, Y1), (X2, Y2) … (Xn, Ym) be the n pairs, where (Xi, Yi) denotes the systolic blood pressure of the i th subject before and after the drug.It is assumed that the differences D1, D2, …, Dn constitute independent normally distributed RV such that: and

TEST STATISTIC:

iiDE 2DiDVar

oDoH : oDH :1vs

nSDTD

o

nD

D i

22 )(1

1 DDn

S iDand

Rejection criteria for testing hypotheses on means when the observation are paired

Null hypothesis Value test statistic under Ho

Alternative hypothesis Rejection criteria

Reject Ho whenor when

oDoH :ns

dtd

o

Reject Ho when

Reject Ho when

1,2 ntt oDH :1

1,21 ntt

1,1 ntt

1, ntt

oDH :1

oDH :1

A paired difference experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and female with the same major and similar GRADE-POINT-AVERAGE. Suppose a random sample of ten pairs is formed in this manner and starting annual salary of each person is recorded. The result are shown in table. Test to see whether there is evidence that the mean starting salary, μ1 , for males exceeds the mean starting salary, μ2, for female. Use α=0,05.

Pair Male Female Difference (male-female)

1 $ 14.300 $13.800 $ 5002 16.500 16.600 -1003 15.400 14.800 6004 13.500 13.500 05 18.500 17.600 9006 12.800 13.000 -2007 14.500 14.200 3008 16.200 15.100 1.1009 13.400 13.200 20010 14.200 13.500 700

Solution:

)0(0: 21 DoHvs

)0(0: 211 DH

Test statistic :

dxnsoxt DDD

D

;

RR : reject Ho if : t > tα ; t0.05,9=1,833

400 nD

xd iD

61,43489,888.1882 DD SS

91,21061,434

400t

0 1,833

t

T-distribution with 9 dof

tcal falls in RR

Reject Ho at the los=0,05

Starting salary for males exceeds the starting salary for females

Consider a classroom where the students are given a test before they are taught the subject matter covered by the test. The student’s score on this pre test are recorded as the first data set. Next, the subject matter is presented to the class. After the instruction is completed, the students are retested on the same material. The scores on the second test, the post test, compose the second data set. It is reasonable to expect that a student that scored high on the pre test will also score high on the post test(and vice versa). Inherently, a strong dependency exists between the members of a pair of scores generated by each individual.Suppose that the scores in table, have been generated by 15 students under the conditions just described. How would you decide whether the instruction had been effective?

Student Pre test Post test D1 54 66 122 79 85 63 91 83 -84 75 88 135 68 93 256 43 40 -37 33 78 458 85 91 69 22 44 2210 56 82 2611 73 59 -1412 63 81 1813 29 64 3514 75 83 815 87 81 -6

A data set with paired scores

EX : Use the T statistic for the hypotheses

versus , which σ = 1to compute :a) β, if α = 0.05 and n = 16b) α, if β = 0.025 and n = 16c) n, if α = 0.05 and β = 0.025

Solution:vs

Ho : μ = 5

Ho : μ = 5

H1 : μ = 6

H1 : μ = 6μ = μo = 6μ = μ1 > μo

Test Statistic :

nXT )(

RR = { > c}X

(a) 05.0)5( cXP

05.0

161

5

161

5(

cXP

05.0)5(4( cTP05.0)( tTP753,115 tt , berarti

753,1)5(4 c c = 5.438

)6()(ˆ 1 cXPbenarHHterimaP o

)248.2()6(4( TPcTP

Tidak ada dalam tabel tJADI PAKAI INTERPOLASIUmumnya, dipakai INTERPOLASI LINEAR

21;)( xxxbxaxf

21 xxx o

)()()()()( 112

121 xx

xxxfxfxfbxaxf ooo

TABEL t

One tail α0,10 0.05 0.025 0.01 0.005 0.001

Two-tail α0,20 0.10 0.05 0.02 0.01 0.002

1.341 1.753 2.131 2.602

υ

123...15

2.248

)()()()()( 112

121 xx

xxxfxfxfxf oo

)117.0(471.0

025.0010.0025.0)(

oxf

021.0)( oxf

021.0)248.2( TP

(b)025.0)6( cXP

β = 0.025 ; n = 16 α = ?

025.0)6(4( cTP025.0)( tTP

131.2t

042.0)868.1( TP

Jadi : 4(c-6) = -2,131 c = 5,467

)5()( cXPbenarHHtolakP oo

TABLE INTERPOLATION

Suppose that it is desired to evaluate a function f(x) at a point xo , and that a table of values of f(x) is available for some, but not all, values of x. In particular, the table may not give the value f(xo) but may give values for f(x1) and f(x2) where x1< xo< x2 .We can use the known values of f(x) for x = x1 , x2 to approximate the value of f(xo) .This process is known as INTERPOLATION. Perhaps the most commonly used interpolation method is linear interpolation.If f(x) is sufficiently smooth and not too curvilinear between x = x1 and x = x2 , calculus tells us that f(x) can be regarded as being nearly linear over the interval [x1 , x2]

That is, Solving the equations :For a and b yields :

21;)( xxxbxaxf

2211 )(;)( bxaxfbxaxf

12

12 )()(xx

xfxfb

12

121

)()()(xx

xfxfxfa

Hence :)()()()()( 1

12

121 xx

xxxfxfxfbxaxf ooo

f(x1)

f(xo)f(x2)

x1 xo x2

f(x)a+bx

EXERCISE1. Let (X1, X2, …, Xn) be a random sample of a normal RV

X with mean μ and variance 100. Let :

vsAs a decision test, we use the rule to accept Ho if , where

is the value of sample mean.a) find RRb) find α and β for n = 16.

2. Let (X1, X2, …, Xn) be a random sample of a Bernoulli R.V X with pmf:

where it is know that 0 < p ≤ .Let : vsand n = 20. As a decision test, we use the rule to reject Ho if

Ho : μ = 50H1 : μ = 55

53xx

21

Ho : p = H1 : p =21 )(1 p

1,0;)1();( 1 xpppxp xxX

21

n

iix

1

6

(a) Find the power function γ(p) of the test.(b) Find α(c) Find β : (i) if and (ii) 1p 2p4

1101

Solutions :2.

a)

b)

Ho : p = 21 H1 : p = )(1 p

21vs

X~BER(p) 1,0;)1()( 1 xppxp xxX

)()( pHrejectPp o

210;)1(

20 206

0

ppp

kkk

k

)21()

21( pHrejectP o

210;)

21()

21(

20 206

0

p

kkk

kTableα=0.058

c) )()( 1 trueisHHacceptPp o

2142,0)43()

41(

201)

41( 20

6

0

kk

k k

)(1 1pHrejectP o

0024,0)109()

101(

201)

101( 20

6

0

kk

k k

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 100. Let :

vsAs a decision test, we use the rule to accept Ho if . Find the value of c and sample size n such that α =0.025 and β = 0.05.

Ho : μ = 50H1 : μ = 55

cx

Solution :}:),...,,{(: 211 cxxxxR n

)50()( cXPbenarHHtolakP oo

025.0)10

50(

n

cZP

025.0)( zZP

n= 52c = 52.718

975.0)( zZP

975.0)10

50(

n

c

60.19)50(96.1)10

50( nc

nc

)55()( 1 cXPbenarHHterimaP o

05.0)10

55(05.0)10

55(

nc

ncP

45.16)55(645.110

)55(

ncnc

29.3)50(92.3)55(5529.3

5092.3

cc

cc50.16429.360.21529.3 cc

21.710.38010.38021.7 cc

7184466,52721

38010c

718.52c

60.19)50( nc

211.72718

19600718.2600.19

718.260.19

n

52998.51 n

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 36. Let :

vsAs a decision test, we use the rule to accept Ho if , where

is the value of sample mean.a) Find the expression for the critical region/rejection region R1

b) Find α and β for n = 16.

Ho : μ = 50H1 : μ = 55

53xx

Solution :a) dimana}53:),...,,{(: 211 xxxxR n

)2()5053( ZPXP

n

iixn

x1

1

0228.09772.01)2(1

)5553()( 1 XPbenarHHterimaP o

)333.1()333.1( ZP

)333.1(1

x1 xo x21.330 1.333 1.340

0.9082 ? 0.9099 1.330 1.340

x1 < xo < x2

)330.1333.1(330.1340.19082.09099.09082.0)333.1(

f

)003.0(0100.00017.09082.0

90871.000051.09082.0)333.1( f

)333.1(1

0913.090870.01

0913.0

COMPARISON OF POPULATION PROPORTIONS

Documents

Transcript of COMPARISON OF POPULATION PROPORTIONS