COMPARISON OF POPULATION PROPORTIONS
Embed Size (px)
description
Transcript of COMPARISON OF POPULATION PROPORTIONS

COMPARISON OF POPULATION PROPORTIONSGiven X1~BIN(n1, p1) and X2~BIN(n2, p2)Statistics :
Are defined to be the sample proportions.
Assume, that X1 and X2 are independent;
2
22
1
11 ˆ;ˆ
nXp
nXp
)ˆ()ˆ()ˆˆ( 2121 pEpEppE
21 pp
)ˆ()ˆ()ˆˆ( 2121 pVarpVarppVar
2
22
1
11 )1()1(n
ppn
pp

For sufficiently large n1 and n2 the standardized statistic :
The (1-α)100% CI :
As p1 and p2 UNKNOWN, approximate (1-α)100% CIfor (p1-p2) :
22
22
1
1121
)1()1()ˆˆ( znpp
npppp
2
22
1
11
2121
)1()1()()ˆˆ(
npp
npp
pppp
2
22
1
11221
)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(
npp
nppzpp

In the testing situation,Ho : p1 = p2 = p ( p unknown )
Versus
Test statistic :
The unknown common value of p is estimated by :
1H
21 pp 21 pp 21 pp
zZRR : zZRR :2: zZRR
testlos
21
21
)1()1(ˆˆ
npp
npp
ppZ
21
21ˆnnXXp

EXAMPLEMembers of the Department of statistics at Iowa State Union collected the following data on grades in an introductory business statistics course and an introductory engineering statistics course.
Course #Students #A gradesB.Stat 571 82E.Stat 156 25 Ho : p1=p2Vs H1 : p1≠p2
; The proportion of A grades in two courses is equal.
1436,057182ˆ1 p 1603,0
15625ˆ 2 p

1472,01565712582ˆ
p
)1561
5711)(8528,0(1472,0
1603,01436,0
Z
52,0Z
The p-value is 2P(Z≤-0,52) = 0,6030 If α= 5% < p-value
Ho would not be rejected
Proportion of A’s does not differ significantly in the two courses.

An insurance company is thinking about offering discount on its life insurance policies to non smokers. As part of its analysis, it randomly select 200 men who are 50 years old and asks them if they smoke at least one pack of cigarettes per day and if they have ever suffered from heart diseases. The results indicate that 20 out of 80 smokers and 15 out of 120 non smokers suffer from heart disease. Can we conclude at the 5% los that smokers have a higher incidence of heart disease than non smokers ? DATA
berumur 50th perokok menderita penyakit JANTUNG parameter : p1
berumur 50th bukan perokok menderita penyakit JANTUNG parameter : p2
EXERCISE
Solution:

Jelas Data Qualitative vs
Test statistic :
ztabSample proportion : ;
Pooled proportion estimate :Value of the test statistic:
)11(ˆˆ
)ˆˆ(
21
21
nnqp
ppz
.645,1: 05,0 zzzRR
25,08020ˆ1 p 125,0
12015ˆ 2 p
175,020035
120801520ˆ
p
hitcal zz ˆ ˆ
ˆˆ1 2
1 2
p -p (0,25-0,125)z= =1 1 1 1pq( + ) 0,175(0,825)( + )n n 80 120
0: 21 ppHo 0: 211 ppH

tabcal zz 28,2 oHreject
Test statistic, is normally distributed
We can calculate p-value
p-value = Reject Ho
%13,10113,0)28,2( zP

SOAL-SOAL
1. Diberikan pmf dari variabel random X sbb: x 0 1 2 3 p(x) 0 k k 3k2
Tentukan k sehingga memenuhi sifat dari pmf!
xxp 0)(
130)( 2 kkkxp0123 2 kk
1,310)1)(13( kkkk
1)(xp
Solusi: Ada dua sifat pmf, yaitu :

Untuk
Dengan demikian tidak memenuhi. Selanjutnya untuk dapat diperiksa ternyata pada kondisi ini memenuhi sifat pmf.Jadi nilai
01)1(1 pk01)2( p
1k31
k
31
k

In a public opinion survey, 60 out of a sample of 100 high-income voters and 40 out of a sample of 75 low-income voters supported a decrease in sales tax.(a) Can we conclude at the 5% los that the
proportion of voters favoring a sales tax decrease differs between high and low-income voters?
(b) What is the p-value of this test?(c) Estimate the difference in proportions, with 99%
confidence! 0)(: 21 ppHo
0)(: 211 ppH
96,1: zRR
)11(ˆˆ
)ˆˆ(
21
21
nnqp
ppz
Solution:
vs
Test statistic :

53,07540ˆ;6,0
10060ˆ 21 pp
571,0175100
751004060ˆ
p
429,0ˆ1ˆ pq
93,0)
751
1001)(429,0(571,0
)53,060,0(
calz
0-1,96 1,96
(a) Conclusion : don not reject Ho(b) p-value = 2P(z > 0,93) = 2(0,1762) = 0,3524.(c)
The difference between the two-proportions is estimated to lie between -0,125 and 0,265
2
22
1
11221
ˆˆˆˆ)ˆˆ(
nqp
nqpzpp 75
)47,0)(53,0(100
)4,0)(6,0(575,2)53,060,0(
195,007,0

TEST on MEANS WHEN THE OBSERVATIONS ARE PAIRED
TESTING THE PAIRED DIFFERENCES
Let (X1, Y1), (X2, Y2) … (Xn, Ym) be the n pairs, where (Xi, Yi) denotes the systolic blood pressure of the i th subject before and after the drug.It is assumed that the differences D1, D2, …, Dn constitute independent normally distributed RV such that: and
TEST STATISTIC:
iiDE 2DiDVar
oDoH : oDH :1vs
nSDTD
o
nD
D i
22 )(1
1 DDn
S iDand

Rejection criteria for testing hypotheses on means when the observation are paired
Null hypothesis Value test statistic under Ho
Alternative hypothesis Rejection criteria
Reject Ho whenor when
oDoH :ns
dtd
o
Reject Ho when
Reject Ho when
1,2 ntt oDH :1
1,21 ntt
1,1 ntt
1, ntt
oDH :1
oDH :1

A paired difference experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and female with the same major and similar GRADE-POINT-AVERAGE. Suppose a random sample of ten pairs is formed in this manner and starting annual salary of each person is recorded. The result are shown in table. Test to see whether there is evidence that the mean starting salary, μ1 , for males exceeds the mean starting salary, μ2, for female. Use α=0,05.

Pair Male Female Difference (male-female)
1 $ 14.300 $13.800 $ 5002 16.500 16.600 -1003 15.400 14.800 6004 13.500 13.500 05 18.500 17.600 9006 12.800 13.000 -2007 14.500 14.200 3008 16.200 15.100 1.1009 13.400 13.200 20010 14.200 13.500 700

Solution:
)0(0: 21 DoHvs
)0(0: 211 DH
Test statistic :
dxnsoxt DDD
D
;
RR : reject Ho if : t > tα ; t0.05,9=1,833
400 nD
xd iD
61,43489,888.1882 DD SS
91,21061,434
400t
0 1,833
t
T-distribution with 9 dof

tcal falls in RR
Reject Ho at the los=0,05
Starting salary for males exceeds the starting salary for females

Consider a classroom where the students are given a test before they are taught the subject matter covered by the test. The student’s score on this pre test are recorded as the first data set. Next, the subject matter is presented to the class. After the instruction is completed, the students are retested on the same material. The scores on the second test, the post test, compose the second data set. It is reasonable to expect that a student that scored high on the pre test will also score high on the post test(and vice versa). Inherently, a strong dependency exists between the members of a pair of scores generated by each individual.Suppose that the scores in table, have been generated by 15 students under the conditions just described. How would you decide whether the instruction had been effective?

Student Pre test Post test D1 54 66 122 79 85 63 91 83 -84 75 88 135 68 93 256 43 40 -37 33 78 458 85 91 69 22 44 2210 56 82 2611 73 59 -1412 63 81 1813 29 64 3514 75 83 815 87 81 -6
A data set with paired scores

EX : Use the T statistic for the hypotheses
versus , which σ = 1to compute :a) β, if α = 0.05 and n = 16b) α, if β = 0.025 and n = 16c) n, if α = 0.05 and β = 0.025
Solution:vs
Ho : μ = 5
Ho : μ = 5
H1 : μ = 6
H1 : μ = 6μ = μo = 6μ = μ1 > μo
Test Statistic :
nXT )(
RR = { > c}X
(a) 05.0)5( cXP
05.0
161
5
161
5(
cXP

05.0)5(4( cTP05.0)( tTP753,115 tt , berarti
753,1)5(4 c c = 5.438
)6()(ˆ 1 cXPbenarHHterimaP o
)248.2()6(4( TPcTP
Tidak ada dalam tabel tJADI PAKAI INTERPOLASIUmumnya, dipakai INTERPOLASI LINEAR
21;)( xxxbxaxf

21 xxx o
)()()()()( 112
121 xx
xxxfxfxfbxaxf ooo
TABEL t
One tail α0,10 0.05 0.025 0.01 0.005 0.001
Two-tail α0,20 0.10 0.05 0.02 0.01 0.002
1.341 1.753 2.131 2.602
υ
123...15
2.248

)()()()()( 112
121 xx
xxxfxfxfxf oo
)117.0(471.0
025.0010.0025.0)(
oxf
021.0)( oxf
021.0)248.2( TP
(b)025.0)6( cXP
β = 0.025 ; n = 16 α = ?
025.0)6(4( cTP025.0)( tTP
131.2t

042.0)868.1( TP
Jadi : 4(c-6) = -2,131 c = 5,467
)5()( cXPbenarHHtolakP oo

TABLE INTERPOLATION
Suppose that it is desired to evaluate a function f(x) at a point xo , and that a table of values of f(x) is available for some, but not all, values of x. In particular, the table may not give the value f(xo) but may give values for f(x1) and f(x2) where x1< xo< x2 .We can use the known values of f(x) for x = x1 , x2 to approximate the value of f(xo) .This process is known as INTERPOLATION. Perhaps the most commonly used interpolation method is linear interpolation.If f(x) is sufficiently smooth and not too curvilinear between x = x1 and x = x2 , calculus tells us that f(x) can be regarded as being nearly linear over the interval [x1 , x2]

That is, Solving the equations :For a and b yields :
21;)( xxxbxaxf
2211 )(;)( bxaxfbxaxf
12
12 )()(xx
xfxfb
12
121
)()()(xx
xfxfxfa
Hence :)()()()()( 1
12
121 xx
xxxfxfxfbxaxf ooo
f(x1)
f(xo)f(x2)
x1 xo x2
f(x)a+bx

EXERCISE1. Let (X1, X2, …, Xn) be a random sample of a normal RV
X with mean μ and variance 100. Let :
vsAs a decision test, we use the rule to accept Ho if , where
is the value of sample mean.a) find RRb) find α and β for n = 16.
2. Let (X1, X2, …, Xn) be a random sample of a Bernoulli R.V X with pmf:
where it is know that 0 < p ≤ .Let : vsand n = 20. As a decision test, we use the rule to reject Ho if
Ho : μ = 50H1 : μ = 55
53xx
21
Ho : p = H1 : p =21 )(1 p
1,0;)1();( 1 xpppxp xxX
21
n
iix
1
6

(a) Find the power function γ(p) of the test.(b) Find α(c) Find β : (i) if and (ii) 1p 2p4
1101
Solutions :2.
a)
b)
Ho : p = 21 H1 : p = )(1 p
21vs
X~BER(p) 1,0;)1()( 1 xppxp xxX
)()( pHrejectPp o
210;)1(
20 206
0
ppp
kkk
k
)21()
21( pHrejectP o
210;)
21()
21(
20 206
0
p
kkk
kTableα=0.058

c) )()( 1 trueisHHacceptPp o
2142,0)43()
41(
201)
41( 20
6
0
kk
k k
)(1 1pHrejectP o
0024,0)109()
101(
201)
101( 20
6
0
kk
k k

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 100. Let :
vsAs a decision test, we use the rule to accept Ho if . Find the value of c and sample size n such that α =0.025 and β = 0.05.
Ho : μ = 50H1 : μ = 55
cx
Solution :}:),...,,{(: 211 cxxxxR n
)50()( cXPbenarHHtolakP oo
025.0)10
50(
n
cZP
025.0)( zZP
n= 52c = 52.718

975.0)( zZP
975.0)10
50(
n
c
60.19)50(96.1)10
50( nc
nc
)55()( 1 cXPbenarHHterimaP o
05.0)10
55(05.0)10
55(
nc
ncP
45.16)55(645.110
)55(
ncnc

29.3)50(92.3)55(5529.3
5092.3
cc
cc50.16429.360.21529.3 cc
21.710.38010.38021.7 cc
7184466,52721
38010c
718.52c
60.19)50( nc
211.72718
19600718.2600.19
718.260.19
n
52998.51 n

Let (X1, X2, …, Xn) be a random sample of a normal RV X with mean μ and variance 36. Let :
vsAs a decision test, we use the rule to accept Ho if , where
is the value of sample mean.a) Find the expression for the critical region/rejection region R1
b) Find α and β for n = 16.
Ho : μ = 50H1 : μ = 55
53xx
Solution :a) dimana}53:),...,,{(: 211 xxxxR n
)2()5053( ZPXP
n
iixn
x1
1
0228.09772.01)2(1

)5553()( 1 XPbenarHHterimaP o
)333.1()333.1( ZP
)333.1(1
x1 xo x21.330 1.333 1.340
0.9082 ? 0.9099 1.330 1.340
x1 < xo < x2

)330.1333.1(330.1340.19082.09099.09082.0)333.1(
f
)003.0(0100.00017.09082.0
90871.000051.09082.0)333.1( f
)333.1(1
0913.090870.01
0913.0