Stat 306: Finding Relaonships in Data.

Post on 26-Oct-2021

2 views 0 download

Transcript of Stat 306: Finding Relaonships in Data.

Stat306:FindingRela1onshipsinData.

Lecture1Introduc1ontoCourse

Themaintopicofthiscourseisregression,whichmeansfiEngpredic1onequa1ons.Regressionisacommonsta1s1calmethodinscien1ficresearch.

Stat306:FindingRela1onshipsinData.

Sta1s1cs–Recap:thetwosamplet-testAgevs.Money

Agevs.Money

Agevs.Money

Dependent variable Independent variable

Agevs.Money

Dependent variable

X

Independent variable

Y

Agevs.Money

Dependent variable

X

Independent variable

Y old(0)Young(1)

dollars($)Inbankaccount

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0

σ2

μ1

Popula1onparameters

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0

σ2

μ1

Mean money ($) for old people

Mean money ($) for young people

Variance ($) for ever=one

Popula1onparameters

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

“Null” hyBothesis

“AlterEative” hyBothesis

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

Sample

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

Sample

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

Sampleold

young

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

John

Paul Mar=

LisaAndy

TimPeter

RoseTony

Sample,n=9

old(0)young(1)

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

old

young

oldold

young

youngyoung

youngyoung

X y

Sample,n=9

old(0)young(1)

715443

452111304510

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

Sample,n=9

old

young

oldold

young

youngyoung

youngyoung

X y Samplesta1s1cs

old(0)young(1)

715443452111304510

Agevs.Money

Dependent variable

X

Independent variable

Y

Popula.on

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

μ0, σ2μ1,

Popula1onparameters

HypothesisTestH0:μ0=μ1H1:μ0≠μ1

Sample,n=9

old

young

oldold

young

youngyoung

youngyoung

X y Samplesta1s1cs

old(0)young(1)

t=2.68,df=7p-value=0.0395%C.I.=[3.4,54.6]

715443452111304510

Agevs.MoneyObjec.ve: Thepurposeofthisobserva1onalstudywasto

demonstrateif,andtowhatextent,ageis associatedwithmoney.

DesignandMethods: Wesurveyedanumberindividualsandforeach

determinedapproximateage(recordedas“old”or“young”) andtheamountofmoney(indollars)intheirbankaccounts. ComparisonofthetwogroupswasdoneusingaStudent twosamplet-test.

Results: Weobtainedarandomsampleofn=9subjects. The“young”grouphadanaverageof$27,whilethe “old”grouphadanaverageof$56.Thises1mateddifference of$29(95%C.I.=[$3.4,$54.6])issta1s1callysignificant,t=2.68, df=7;p-value=0.03.

Conclusions: Wefoundthat,ashypothesized,ageisassociated

withmoney.Onaverage,youngerpeoplehavelessintheiraccountsthanolderpeople.SmallPrint: Theanalysisrestsonthefollowingassump1ons:

- theobserva1onsareindependentlyandiden1callydistributed. - theindependentvariable,money,isnormallydistributed. - thetwopopula1onsbeingcomparedhavethesamevariance.

t=2.68,df=7p-value=0.0395%C.I.=[3.4,54.6]

0

20

40

60

80

100

Boxplot

Age

Mon

ey ($

)

Young Old

Agevs.Money

Dependent variable

X

Independent variable

Y old(1)young(0)

dollars($)Inbankaccount

LinearRegression

Agevs.Money

Dependent variable

X

Independent variable

Y old(1)young(0)

dollars($)Inbankaccount

LinearRegression

Agevs.Money

Dependent variable

X

Independent variable

Y dollars($)Inbankaccount

LinearRegression

AgeinYears

Agevs.Money

PREDICTOR variable

X

RESPONSE variable

Y dollars($)Inbankaccount

AgeinYears

LinearRegression

Agevs.Money

Popula.on

dollars($)Inbankaccount

β0

σ2

β1

Popula1onparameters

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

β0, σ2β1,

Popula1onparameters

HypothesisTestH0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest“Null” hyBothesis

“AlterEative” hyBothesis

β0, σ2β1,

H0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

Sample

β0, σ2β1,

H0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

Sample

β0, σ2β1,

H0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

Sampleold

young

β0, σ2β1,

H0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

John

Paul Mar=

LisaAndy

TimPeter

RoseTony

Sample,n=9

β0, σ2β1,

H0:β1=0H1:β1≠0

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

β0, σ2β1,

H0:β1=0H1:β1≠0

Sample,n=9

82

22

4571

29

129

1824

X y

71

54

43452111304510

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

Sample,n=9Samplesta1s1cs

β0, σ2β1,

H0:β1=0H1:β1≠0

82

22

4571

29

129

1824

X y

71

54

43452111304510

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

b0=17.7b1=0.55s=15.5R2=0.49

Agevs.Money

Popula.on

dollars($)Inbankaccount

Popula1onparameters

HypothesisTest

Sample,n=9Samplesta1s1cs

β0, σ2β1,

H0:β1=0H1:β1≠0

82

22

4571

29

129

1824

X y

71

54

43452111304510

AgeinYears

PREDICTOR variable

X

RESPONSE variable

Y

b0=17.7b1=0.55s=15.5R2=0.49

Forparameterβ1:

Agevs.MoneyObjec.ve: Thepurposeofthisobserva1onalstudywasto

demonstrateif,andtowhatextent,ageis associatedwithmoney.

DesignandMethods: Wecollectedarandomsampleofindividualsandforeach

determinedtheirage(recordedinyears)andtheamount ofmoney(indollars)intheiraccounts.Analysisof thedatawasdoneusinglinearregression.

Results: Weobtainedarandomsampleofn=9subjects. Thereisa

sta1s1callysignificantassocia1onbetweenageandmoney(p-value=0.036). Foreveryaddi1onalyearinage,anindividual’samountofmoneyincreases onaveragebyanes1matedof$0.55(95%C.I.=[$0.05,$1.05]).

Conclusions: Wefoundthat,ashypothesized,ageisassociatedwithmoney. Inoursampleageaccountedforabouthalfofthevariability observedinmoney(R2=0.49).Wepredictthata50yearoldwill have$45.1(95%P.I.=[$5.6,$84.5]),whereasa40year oldwillhave$39.6(95%P.I.=[$0.8,$78.4]).

SmallPrint: Theanalysisrestsonthefollowingassump1ons:

- theobserva1onsareindependentlyandiden1callydistributed. - theresponsevariable,money,isnormallydistributed. - Homoscedas1cityofresidualsorequalvariance. - therela1onshipbetweenresponseandpredictorvariablesislinear.

Forparameterβ1:

0 20 40 60 80 100

0

20

40

60

80

100

Age (years)

Mon

ey ($

)