Stat 306: Finding Relaonships in Data.
Transcript of Stat 306: Finding Relaonships in Data.
Stat306:FindingRela1onshipsinData.
Lecture1Introduc1ontoCourse
Themaintopicofthiscourseisregression,whichmeansfiEngpredic1onequa1ons.Regressionisacommonsta1s1calmethodinscien1ficresearch.
Stat306:FindingRela1onshipsinData.
Sta1s1cs–Recap:thetwosamplet-testAgevs.Money
Agevs.Money
Agevs.Money
Dependent variable Independent variable
Agevs.Money
Dependent variable
X
Independent variable
Y
Agevs.Money
Dependent variable
X
Independent variable
Y old(0)Young(1)
dollars($)Inbankaccount
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0
σ2
μ1
Popula1onparameters
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0
σ2
μ1
Mean money ($) for old people
Mean money ($) for young people
Variance ($) for ever=one
Popula1onparameters
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
“Null” hyBothesis
“AlterEative” hyBothesis
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
Sample
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
Sample
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
Sampleold
young
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
John
Paul Mar=
LisaAndy
TimPeter
RoseTony
Sample,n=9
old(0)young(1)
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
old
young
oldold
young
youngyoung
youngyoung
X y
Sample,n=9
old(0)young(1)
715443
452111304510
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
Sample,n=9
old
young
oldold
young
youngyoung
youngyoung
X y Samplesta1s1cs
old(0)young(1)
715443452111304510
Agevs.Money
Dependent variable
X
Independent variable
Y
Popula.on
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
μ0, σ2μ1,
Popula1onparameters
HypothesisTestH0:μ0=μ1H1:μ0≠μ1
Sample,n=9
old
young
oldold
young
youngyoung
youngyoung
X y Samplesta1s1cs
old(0)young(1)
t=2.68,df=7p-value=0.0395%C.I.=[3.4,54.6]
715443452111304510
Agevs.MoneyObjec.ve: Thepurposeofthisobserva1onalstudywasto
demonstrateif,andtowhatextent,ageis associatedwithmoney.
DesignandMethods: Wesurveyedanumberindividualsandforeach
determinedapproximateage(recordedas“old”or“young”) andtheamountofmoney(indollars)intheirbankaccounts. ComparisonofthetwogroupswasdoneusingaStudent twosamplet-test.
Results: Weobtainedarandomsampleofn=9subjects. The“young”grouphadanaverageof$27,whilethe “old”grouphadanaverageof$56.Thises1mateddifference of$29(95%C.I.=[$3.4,$54.6])issta1s1callysignificant,t=2.68, df=7;p-value=0.03.
Conclusions: Wefoundthat,ashypothesized,ageisassociated
withmoney.Onaverage,youngerpeoplehavelessintheiraccountsthanolderpeople.SmallPrint: Theanalysisrestsonthefollowingassump1ons:
- theobserva1onsareindependentlyandiden1callydistributed. - theindependentvariable,money,isnormallydistributed. - thetwopopula1onsbeingcomparedhavethesamevariance.
t=2.68,df=7p-value=0.0395%C.I.=[3.4,54.6]
0
20
40
60
80
100
Boxplot
Age
Mon
ey ($
)
Young Old
Agevs.Money
Dependent variable
X
Independent variable
Y old(1)young(0)
dollars($)Inbankaccount
LinearRegression
Agevs.Money
Dependent variable
X
Independent variable
Y old(1)young(0)
dollars($)Inbankaccount
LinearRegression
Agevs.Money
Dependent variable
X
Independent variable
Y dollars($)Inbankaccount
LinearRegression
AgeinYears
Agevs.Money
PREDICTOR variable
X
RESPONSE variable
Y dollars($)Inbankaccount
AgeinYears
LinearRegression
Agevs.Money
Popula.on
dollars($)Inbankaccount
β0
σ2
β1
Popula1onparameters
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
β0, σ2β1,
Popula1onparameters
HypothesisTestH0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest“Null” hyBothesis
“AlterEative” hyBothesis
β0, σ2β1,
H0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
Sample
β0, σ2β1,
H0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
Sample
β0, σ2β1,
H0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
Sampleold
young
β0, σ2β1,
H0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
John
Paul Mar=
LisaAndy
TimPeter
RoseTony
Sample,n=9
β0, σ2β1,
H0:β1=0H1:β1≠0
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
β0, σ2β1,
H0:β1=0H1:β1≠0
Sample,n=9
82
22
4571
29
129
1824
X y
71
54
43452111304510
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
Sample,n=9Samplesta1s1cs
β0, σ2β1,
H0:β1=0H1:β1≠0
82
22
4571
29
129
1824
X y
71
54
43452111304510
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
b0=17.7b1=0.55s=15.5R2=0.49
Agevs.Money
Popula.on
dollars($)Inbankaccount
Popula1onparameters
HypothesisTest
Sample,n=9Samplesta1s1cs
β0, σ2β1,
H0:β1=0H1:β1≠0
82
22
4571
29
129
1824
X y
71
54
43452111304510
AgeinYears
PREDICTOR variable
X
RESPONSE variable
Y
b0=17.7b1=0.55s=15.5R2=0.49
Forparameterβ1:
Agevs.MoneyObjec.ve: Thepurposeofthisobserva1onalstudywasto
demonstrateif,andtowhatextent,ageis associatedwithmoney.
DesignandMethods: Wecollectedarandomsampleofindividualsandforeach
determinedtheirage(recordedinyears)andtheamount ofmoney(indollars)intheiraccounts.Analysisof thedatawasdoneusinglinearregression.
Results: Weobtainedarandomsampleofn=9subjects. Thereisa
sta1s1callysignificantassocia1onbetweenageandmoney(p-value=0.036). Foreveryaddi1onalyearinage,anindividual’samountofmoneyincreases onaveragebyanes1matedof$0.55(95%C.I.=[$0.05,$1.05]).
Conclusions: Wefoundthat,ashypothesized,ageisassociatedwithmoney. Inoursampleageaccountedforabouthalfofthevariability observedinmoney(R2=0.49).Wepredictthata50yearoldwill have$45.1(95%P.I.=[$5.6,$84.5]),whereasa40year oldwillhave$39.6(95%P.I.=[$0.8,$78.4]).
SmallPrint: Theanalysisrestsonthefollowingassump1ons:
- theobserva1onsareindependentlyandiden1callydistributed. - theresponsevariable,money,isnormallydistributed. - Homoscedas1cityofresidualsorequalvariance. - therela1onshipbetweenresponseandpredictorvariablesislinear.
Forparameterβ1:
0 20 40 60 80 100
0
20
40
60
80
100
Age (years)
Mon
ey ($
)