Some Terms
description
Transcript of Some Terms
Some Terms
• Y = o + 1X
Regression of Y on X
Regress Y on X
X called independent variable or predictor variable or covariate or factor
Which factors are related to Y?
Multiple linear regression model
Y = o + 1X1 + 2X2 + ... + pXp + ε
Y = outcome, dependent variable
Xi = predictors, independent variables
ε = error (or residual), normal; mean = 0, constant variance = 2
reflects how individuals deviate from others with the same values of x’s
i parameters describing the intercept and slope for each predictor
Evaluating Assumptions
Y = o + 1X1
Y is normally distributed for each value of X• Can draw histogram overall for Y – can’t likely do for each X
Mean of Y changes linearly with X• Scatterplot of X and Y (see if points follow a line)
• Plots of residuals versus X (or predicted values)
Variance 2 is constant for each X • Scatterplot of X and Y (see if deviations from line are same by X levels)
Remember there is no assumption on distribution of X
Plot of SBP Versus AGE
PLOT sbp*age;
Plot of Model Residuals Versus AGE
Look for patterns. Patterns indicate relationship not linear. Note the sum of residuals = 0
PLOT residual.*age;
Plot of Model Residuals Versus Predicted Values
Look for patterns. For simple regression this is same as previous graph (residual versus x)
PLOT residual.* predicted.;
Evaluating Assumptions: Multiple Regression
Y = o + 1X1 + 2X2
Y is normally distributed for each combination of Xs• Can draw histogram overall – can’t likely do for each X
Mean of Y changes linearly with each X and for every value of every other X
Variance 2 is constant for each combinations of X• Scatterplot of Y with each X (doesn’t really test assumption)
• Scatterplot of residuals versus predicted values
• Test for interactions
Interpreting Coefficients: Simple Regression
Y = o + 1X1
o = mean of Y when X1 = 01= change in mean of Y per 1-unit increase in X1
Suppose X1 = 5 Then Y = o + 51 Suppose X1 = 6 Then Y = o + 61
mean Yx1=6 – mean Yx1=5 = (0 + 61) - (0 + 51) = 1
Same difference for any x and x+1 chosen
Interpreting Coefficients: Multiple Regression
Y = o + 1X1 + 2X2
o = mean of Y when X1 = 0 and X2 = 0
1= change in mean of Y per 1-unit increase in X1 for fixed X2
Suppose X1 = 5 Then Y = o + 51 + X2 2
Suppose X1 = 6 Then Y = o + 61 + X2 2
mean Yx1=6 – mean Y x1=5 = (0 + 61 + X2 2) - (0 + 51 + X2 2) = 1
Same value for every value of X2
Interpreting Relationships: Multiple Regression
Y = o + 1X1 + 2X2
1 measures effect of X1 “adjusting for X2 ” or “above and beyond” X2
2 measures effect of X2 “adjusting for X1” or “above and beyond” X1
If X1 is significantly related to Y in simple regression but not after including X2 in the model then:
1) The relationship of Y to X1 was confounded by X2
2) X1 is not an independent predictor of Y
Multiple Regression: R2
• Coefficient of Determination (R2) is proportion of variance explained by all variables in model
• Adding variables to the model can only increase the R2.
• Adding a highly correlated variable to a model will likely add little to R2.
• Always interpret R2 in the context of the problem– Laboratory conditions yield high R2
– Real world yield lower R2 but X variables may still be important
Categorical Predictors; 0/1 coding
Compare two groups; A and B. Let X = 0 for A, X = 1 for B
Y = 0 + 1X
For Group A, X= 0, mean outcome is;Y = 0 + 1(0) = 0
For Group B, X = 1, mean outcome is; Y = 0 + 1(1) = 0 + 1
mean Ygroup B - mean Ygroup A = (0 + 1) - 0 = 1
0 is the mean response for Group A
is the difference in mean response between Group B and Group A
What if I use 1 and 2?
Compare two groups; A and B. Let X = 1 for A, X = 2 for B
Y = 0 + 1X
For Group A, X= 5, mean outcome is;Y = 0 + 1(1) = 0 + 1
For Group B, X = 6, mean outcome is; Y = 0 + 1(2) = 0 + 21
mean Ygroup B - mean Ygroup A = (0 + 21) – (0 + 1 ) = 1
0 + 1 is the mean response for Group A
is the difference in mean response between Group B and Group A
Categorical Predictors
• More than two groups require more dummy (indicator) variables
• Choose one group as reference group
• Form a indicator variable for each of the other groups
• K groups require K-1 indicator variables
Example - three groups
Diet 1, 2, and 3; Choose “3” as reference group (could choose any of three)
Y = 0 + 1X1 + 2X2
Diet 1: X1 = 1, X2 = 0
Diet 2: X1 = 0, X2 = 1
Diet 3: X1 = 0, X2 = 0
0 is mean response for Diet 3
1 is difference in mean response between Diet 1 and Diet 3
2 is difference in mean response between Diet 2 and Diet 3
ID Diet X1 X2
1 3 0 0
2 1 1 0
3 2 0 1
4 2 0 1
5 3 0 0
DUMMY CODING IN SAS
* Assume variable diet with value 1-3;x1 = 0;x2 = 0;if diet = 1 then x1 = 1;if diet = 2 then x2 = 1;
PROC REG DATA = lipid; MODEL chol = x1 x2; RUN;
DATA lipid;INFILE DATALINES;INPUT diet chol wt;* Assume variable diet with value 1-3;x1 = 0;x2 = 0;if diet = 1 then x1 = 1;if diet = 2 then x2 = 1;DATALINES;1 175 1401 180 1351 185 1451 190 1401 195 1552 190 1402 195 1352 200 1502 205 1552 210 1503 180 1403 185 1503 190 1553 195 1453 200 150;
PROC MEANS N MEAN STD; CLASS diet;PROC REG; MODEL chol = x1 x2;RUN;
PROC MEANS OUTPUT
Analysis Variable : chol
diet Obs N Mean Std Dev
1 5 5 185.0000000 7.9056942
2 5 5 200.0000000 7.9056942
3 5 5 190.0000000 7.9056942
PROC REG OUTPUT
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 190.00000 3.53553 53.74 <.0001
x1 1 -5.00000 5.00000 -1.00 0.3370
x2 1 10.00000 5.00000 2.00 0.0687
PROC REG; MODEL chol = x1 x2 ; MODEL chol = x1 x2 wt;RUN;
PROC REG OUTPUT
Parameter StandardVariable DF Estimate Error t Value Pr > |t|
Intercept 1 190.00000 3.53553 53.74 <.0001x1 1 -5.00000 5.00000 -1.00 0.3370x2 1 10.00000 5.00000 2.00 0.0687
Parameter StandardVariable DF Estimate Error t Value Pr > |t|
Intercept 1 84.28571 36.91070 2.28 0.0433x1 1 -1.42857 4.13890 -0.35 0.7365x2 1 11.42857 3.97892 2.87 0.0152wt 1 0.71429 0.24868 2.87 0.0152