Simple Linear Regression

Upload
aprilharrison 
Category
Documents

view
40 
download
4
Embed Size (px)
description
Transcript of Simple Linear Regression
ETM 620  09U1
Simple Linear RegressionOften we want to understand the relationships
among variables, e.g.,SAT scores and college GPAcar weight and gas mileageamount of a certain pollutant in wastewater and bacteria
growth in local streamsnumber of takeoffs and landings and degree of metal
fatigue in aircraft structuresSimplest relationship
Y = β0 + β1x
0
0.5
11.5
22.5
33.5
44.5
115 120 125 130 135 140
Predictor variable, x
Res
pons
e va
riabl
e, y
ETM 620  09U1
ETM 620  09U2
ExampleAn electric power cooperative is concerned about the cost of power outages in the winter and the analyst has an idea that these costs are directly related to the average temperature during the outage period. A random sampling of power outages over a number of years was conducted and the cost per 100 homes (adjusted for inflation) was determined, with these results:
ETM 620  09U2
Temp, °F
Cost/ Outage
45 $3,639 42 $4,111 44 $3,928 37 $4,252 33 $5,020 45 $3,838 35 $4,293 38 $4,244 39 $4,227 40 $4,111 30 $5,335
Avg. Cost/ Outage
$3,000$3,500$4,000$4,500$5,000$5,500
25 30 35 40 45 50Temperature
Cost
ETM 620  09U3
Estimating the regression coefficients Method of Least Squares
Determine estimates for β0 and β1 so that the sum of the squares of the residuals is minimized, that is …
Solution to the minimization gives:
xy
xxn
yxyxnn
ii
n
ii
n
i
n
iii
n
iii
10
211
21 11
1
ˆˆ)(
)()(ˆ
ETM 620  09U3
min L i1
ni
2 i1
n(y i 0 1x1)
2
ETM 620  09U4
For our example,
ETM 620  09U4
ˆ 1 ______________________________
ˆ 0 ______________________________
Sample Temp, x Cost, y xiyi xi
2
1 45 $3,639 163,755 20252 42 $4,111 172,662 17643 44 $3,928 172,832 19364 37 $4,252 157,324 13695 33 $5,020 165,660 10896 45 $3,838 172,710 20257 35 $4,293 150,255 12258 38 $4,244 161,272 14449 39 $4,227 164,853 152110 40 $4,111 164,440 160011 30 $5,335 160,050 900
sum = 428 46998 1805813 16898
ETM 620  09U5
What does this mean?We can draw the regression line that describes
the relationship between temperature and outage cost:
We can also predict the cost of outages based on expected temperatures.
ˆ y ˆ 0 ˆ 1x
ETM 620  09U5
Cost vs Temperature
$3,000$3,500$4,000$4,500$5,000$5,500
25 30 35 40 45 50Temperature
Cost
ETM 620  09U6
Dangers of regression analysisYou can regress any variable on any other variable
e.g., hair loss and heart disease; hours playing video games and number of arrests for violent behavior; consecutive hours in class and retention of material; etc.
Which of these relationships can you legitimately claim reflect a causal relationship between the “predictor” and the “response”?
The regression equation is a “best fit” for the data on which it is based, but may lose validity for predictor values outside the range of the data.For example, our outage cost data implies that the cost
per outage decreases as the temperature increases – do you believe that temperatures in the 80’s or 90’s will result in lowcost outages?
ETM 620  09U7
How good is our prediction?Estimating the variance:
Lack of fit test,Tests the hypotheses
H0: the model adequately fits the dataH1: the model does not fit the data
As with our goodnessoffit tests, a high pvalue indicates that the model is adequate.
ˆ 2 s2 SSE
n 2
(y i ˆ y )2
i1
n
n 2
__________________
(see next page)
ETM 620  09U7
ETM 620  09U8
How good is our prediction?
ETM 620  09U8
Coefficient of determination, R2
a measure of the “quality of fit,” or the proportion of the variability explained by the fitted model.
Use with care – increasing the number of variables will usually increase R2, but this doesn’t necessarily make it a “better” model!
n
ii
n
iii
yy
E
yy
yy
SSSR
12
12
2
)(
)ˆ(11
ETM 620  09U9
Linear regression in Excel …Step 1: Graph the data
Does it look like a straight line is the best fit?
Avg. Cost/ Outage
$0$1,000$2,000$3,000
$4,000$5,000$6,000
25 30 35 40 45 50
Temperature
Cos
t
ETM 620  09U9
ETM 620  09U10
Step 2: Perform the analysisChoose “Regression” from the Data Analysis
menu (under Tools). Input the Yrange (Cost, including the label) and Xrange (Temp, including the label), then select“Labels” if you included those in your data
range.Your desired location for the output.Residuals and Normal Probability Plot, as
desired.Choose “OK”
ETM 620  09U10
ETM 620  09U11
Step 3: Check assumptionsLook at residuals plot and normal
probability plots.Temp Residual Plot
500
0
500
0 10 20 30 40 50
Temp
Resi
dual
s
Normal Probability Plot
05000
10000
0 50 100 150
Sample Percentile
Avg
. Cos
t/ O
utag
e
ETM 620  09U11
ETM 620  09U12
Step 4. Evaluate the results.SUMMARY OUTPUT
Regression StatisticsMultiple R 0.9318278R Square 0.868303Adjusted R Square 0.85367Standard Error 189.43373Observations 11
ANOVAdf SS MS F Significance F
Regression 1 2129376.47 2129376.47 59.34 0.00Residual 9 322966.25 35885.14Total 10 2452342.73
Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 7900.61 474.43 16.65 0.00 6827.36 8973.85 6827.36 8973.85Temp 93.24 12.10 7.70 0.00 120.63 65.86 120.63 65.86
ETM 620  09U12
ETM 620  09U13
Step 5. Specify and use the model.Simple linear model:
Use the model to:Make predictions
expected costsbudgeting
Recommend actionsidentify and address sources of cost increase
ETM 620  09U13
ˆ y _________________
ETM 620  09U14
In Minitab …Step 1: Graph the data (for one or two predictor
variables)!Again, do you think a simple linear relationship is the best
fit?Step 2: Select Stat Regression Regression …Step 3: Choose “Response” (y) and “Predictor” (x).Step 4: In “Options”, check the “Lack of Fit” box.
(“Fit Intercept” box should be checked by default.) Click “OK”.
Step 6: In “Graphs” select the appropriate residual plots to create.
Step 5: Click “OK”.Step 6: Evaluate the residual plots and results.
ETM 620  09U14
ETM 620  09U15
Transformation to a straight line ..,If simple linear regression is not appropriate
because the underlying function is nonlinear, then we have two choicesfit a more complex modeltransform the model to a straightline model
Simplest transformation – logarithmic transformation
Original model:
Transformed model:
lnlnln 10
0 1
xy
ey x