Applied Statistics And Doe Mayank
-
Upload
realmayank -
Category
Documents
-
view
1.862 -
download
2
description
Transcript of Applied Statistics And Doe Mayank
Applied Statistics and DOE
Mayank
MeanMedianMode
Measures of dispersion (spread of data)
VarianceStandard deviationCoefficient of variation
Measures of central tendency (central position of data)
Applied Statistics
Population : µ xSample:
Population : Sample:σ2 s2
Population : Sample:σ s
Mean
Mode
Median
Measures of Central tendency
Data: 34, 43, 81, 106, 106 and 115
Average Σx/n = 80.83
Highest frequency = 106
Middle score (81+106)/2 = 93.5
Variance:
Most of the data lies between 44.5±4,57 = 39to 49
Standard deviation:
44503849424740394650
188.5
44.5
-0.55.5-6.54.5-2.52.5-4.5-5.51.55.5
0.330.342.320.36.36.3
20.330.32.3
30.3
20.9
4.57
x
2)( xx
n
ii xx
1
2)( SS
SS/(n-1) MS
sd√MS
)( xx
Measures of dispersion
x
Coefficient of VarianceCV = S/ *100%
4.57/44.5*100% = 10.28%
Standard deviation is 10.28% of the mean
Measures of dispersion
Grade ScoreGenius 145Gifted 130-144
Above average 115-129Higher average 100-114Lower average 85-99Below average 70-84Borderline low 55-69
Low <55
Normal Distribution
Example: IQ Score
Measures of dispersion
115 130 145100857055 145<<55
Coun
t
Score
IQ ScoreNormal Distribution
Measures of dispersion
34.13%34.13%
13.59%13.59 %
2.14%2.14%0.13% 0.13%
Prob
abili
ty
Score
-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from
0.0031% 0.000028%0.0031%0.000028%
6σ
Normal Distribution
Measures of dispersion
μ68.2689%95.4499%99.7300%99.9936%99.999942669%99.999999802 %
-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from6σ μ
Normal Distribution
Measures of dispersion
99.999999802 %
0.0000001980.00198
USLLSL
DPMODPHOSix Sigma
Measures of dispersion
Normal Distribution
USLLSL USLLSL
Normal Distribution
Measures of dispersion
USLLSL
-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σ 6σ μ
1.5 σ
3.4 DMPO
Statistical significance tests
Significance tests
Z- test
t- test
F- test
ANOVA
+ve z: values are above the mean,
-ve z: values are below the mean
1 point compared to population Group compared to population
Population
ii
xz
n
xz
Statistical significance tests
Z - test Z-value :
How many standard deviations away from mean?
s
xxz
Sample
s
xxz
07.1
57.6
20.262.19
So this person has a BMI 1.07 standard deviations below the mean
What is the probability that of a person having BMI
19.2 sd below the mean
19.2 sd above the mean
Statistical significance tests
Z - test
Mean ( ) = 26.20Standard deviation (s) = 6.57
x
Sample :
BMI
A person with a BMI of 19.2 has a z score of:
Prob
abili
ty
Sd
-1σ μ
<19.6 >19.6
0
Standard deviationZ score
Statistical significance tests
Z - test Sample :
-1
84 %16 %
Test group : Employee having two wheelerTest : Commuting time from home to BioconClaim : Average commuting time is less than 24 min
Samples : 30
18 16 23 19 25 48 13 17 20 23
16 21 18 16 29 15 8 19 20 7
15 16 24 15 6 11 14 23 18 12
At 0.01 level of significance (α=0.01):Is there enough evidence to support the research claim???
Statistical significance tests
Z - test Population :
Statistical significance tests
Z - test Population :
Assumption: Population is normally distributed
X24Mean
Prob
abili
ty
Score
Hypothesis testing
Null hypothesis : H0
Alternate hypothesis : H1
Comparison of means:
H1 : x < µ
H0 : x ≥ µ
Statistical significance tests
Z - test Population :
No difference (Claim not true)
µ = 24
It is different (Claim is true)
Test vs Population
Prob
abili
ty
Z value Z0Criticalvalue
Level of significanceα = 0.01
24Mean X
Prob
abili
ty
Score
Statistical significance tests
Z - test Population :
-2.33
nsx
z
Z-2.33
Rejection region
= 18.2s = 7.7x
Z = - 4.13
Acceptance region
Statistical significance tests
Z - test Population :
µ = 24n = 30
Ztest< Zcritical Ztest>Zcritical
Z
Rejection region
-2.33- 4.13
H0 : s ≥ 24 Rejected
So is test value is significantly different (lower) than the mean
Yes: There are significant evidence to reject the null hypothesis
and therefore accept the claim
H1 : s < 24 Significantly supported
Statistical significance tests
Z - test Population :
H0:
H1:
Statistical significance tests
t - test Comparison of means between two groups
ttest > tcritical Null hypothesis will be rejected
ttest < tcriticalNull hypothesis will not be rejected
t = Signal
Noise
Difference between group means
Variability of groups=
21
21
xxs
xxt
2
22
1
21
21 n
s
n
ss xx
Statistical significance tests
t - test Comparison of means between two groups
35 240 2712 3815 3121 1114 1946 1110 3428 1048 1116 1230 1532 2248 1131 1222 1212 1239 2919 3725 2
Fertilizer w/o Fertilizer
x 27.15 17.9
156.45 122.61s2
t test = 27.15 – 17.9
20
61.122
20
45.156
= 2.4
t critical with 38 dfat 0.05 significance level= 2.03
ttest > tcritical
H0:
H1:
21 xx
21 xx
Rejected
1xSo is significantly different from 2x
Plan
t hei
ght
Statistical significance tests
t - test Case 1 Effect of fertilizer on plant height
df = 2n-2
2 227 2738 3831 31
100 11115 1911 1134 3410 1011 1112 1215 1522 2211 1112 1212 1212 1229 2937 372 2
Fertilizer w/o Fertilizer
x 27.15 17.9
880.1 122.61s2
t test = 1.3
t critical =2.03
ttest < tcritical
H0:
H1: 21 xx 21 xx
Not rejected
1xSo is not significantly different from 2x
Plan
t hei
ght
Statistical significance tests
t - test
Rejected
Case 2
Statistical significance tests
t - test Overview
Comparison of variances
F = where and are the sample variances
The F hypothesis test is defined as:
H0: =
If Ftest > Fcritical (at significant level)
Rejected
Statistical significance tests
F - test
Ha: <
>
≠
ANalysis Of VAriance
One way :
Two way :
• Effect of one factor (variable)
• Effect of two factors (variables)
• Effect of interaction
Statistical significance tests
ANOVA
Strategy:
F = MSbg
MSwg
Compare variability within group MSwg to between groups MSbg
Between groups Within groups
Group 1 Group 2 Group 1 Group 2
Statistical significance tests
One way ANOVA
Factor ( Independent Variable): Temperature (cold, optimum, hot)
Effect ( Dependent Variable): Score (marks obtained)
Null hypothesis (H0) : No effect (µ1= µ2 = µ3)
Alternate hypothesis (H1) : There is an effect (µ1 ≠ µ2 ≠ µ3)
Is there any impact of exam room temperature on student performance?
Statistical significance tests
One way ANOVA
SSbg 748.44n x ( + + )
55
60
51
65
72
65
55
72
68
60
75
67
75
65
80
75
67
68
77
83
67
56
65
83
67
53
65
49
54
61
65
72
63
64
54
65
63.75 71.75 61
65.5
Cold Opt Hot7714
1632
682
77681814
12711
11466811231428
12723
24846
127
366416
144490
16121
49
4916
638 768 524
374.25
3.06 39.06 20.25
SSbg/df
58.5
C O HN
umbe
r of A
tten
dees
3.06 39.06 20.25 =
MSbg=
SSM SSW SSS
SSwg
+ +
= 1930
SSwg/dfMSwg=
SS
= =
(df = 3-1 = 2) (df = (12x3)-3 = 33)
2)( xx
x
3/x = X̄�2)( xx
Statistical significance tests
One way ANOVA
=374.25
58.5=F =
MSbg
MSwg
Fcritical for
Numerator degrees of freedom : 2Denominator degrees of freedom : 33 At significance level (α) : 0.05
= 4.17
Ftest > Fcritical
So there are enough evidence to reject null hypothesis
At 95% confidence level we can say:
That the variation between means is not just by chance
6.40
H0: All means are same (no effect of Temperature) Rejected
Examination Room temperature matters significantly
Statistical significance tests
One way ANOVA
Factors ( Independent Variable): 1) Gender:
Effect ( Dependent Variable): 1) Number of participants
Relative impact of gender or type of sprot?
Null hypothesis (H0a) : No effect of gender
Alternate hypothesis (H1) : There is an effect
2) Type of sport
Any interaction between gender and type of sport?
Null hypothesis (H0b) : No effect of type of sportNull hypothesis (H0c) : No interaction
Statistical significance tests
Two way ANOVA
Man Woman
Indoor Outdoor
30, 40, 50 60, 70, 80
140, 150, 160 5, 10, 15
Man Woman
Indoor
Outdoor
Source Df SS MS F
Gender g-1 SSG MSG MSG /Mswithin
Sports s-1 SSs MSs MSs /Mswithin
G x S (g-1)(s-1) SSG x s MSG x s MSG x s /MSwithin
Within (k-1) x I x j SSwithin MSwithin
Source Df SS MS F Fcritical (α=0.01)
Gender 1 9600 9600 118.15 11.3
Sport 1 1875 1875 23.07 11.3
G x S 1 21675 2165 266.75 11.3
Within 8 81.25 231.25
g→ s↓
Statistical significance tests
Two way ANOVA
Woman Man
Ind 70 50
Otd 10 150
0
20
40
60
80
100
120
140
160
ManWoman
Indoor Outdoor
Null hypothesis (H0a) : No effect of gender Rejected
Null hypothesis (H0b) : No effect of type of sports Rejected
Null hypothesis (H0c) : No interaction Rejected
Statistical significance tests
Two way ANOVA
0
20
40
60
80
100
120
140
160
30o C 35o C
30o C 35o C
pH7 70 50
pH5 10 150
Statistical significance tests
Two way ANOVA
pH 5pH 7
Factors ( Independent Variable): 1) Temperature:
Effect ( Dependent Variable): 1) Total product (g)
2) pH
30 35
5 7
Investigation of relationship between variables
X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3
0 5 10 15 20 25 30 35 40 45 50X
Y
Regression and correlation
Regression analysis:
Investigation of relationship between variables
X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3
0 5 10 15 20 25 30 35 40 45 50X
Y
R² = 0.955
y = -0.951x + 50.49 y = ax +b
Simple linear regression
One independent variable
Regression and correlation
Regression analysis:
y = ax + b
y = a1x1 + a2x2 + a3x3 + b
Simple linear regression
Multiple linear regression
Linear Non Linear
Regression and correlation
Regression analysis:
Non linear
y = a1x1 + a2x2 + a11 x2 + a12 x1x2 +b
Is the relationship we have described statistically significant?-Significant tests
To find how well (or badly) a line fits the observation
What is the strength of this relationship- r2 (coefficient of determination) or adjusted r2
Regression and correlation
Correlation analysis:
ε
ŷ = ax + b
slope intercept
= ŷ, predicted value
ε = residual error =
= y i , true value
y - ŷ
A and b values are calculated that minimize Sum of Squares (SS) of residuals =Σ (y – ŷ)2 : minimum
Regression and correlation
Correlation analysis:
Total Error
SSTotal
SSErrorr2 = 1-
Regression and correlation
Correlation analysis:
SSTotal/(n-1)
SSError/(n-p-1)Adjusted r2= 1-
n= total observationp= Number of predictor
(yi – y)2 (y – ŷ)2
r2 : Coefficient of determination
Always between 0 and 1Increase with number of predictor
It can be negative alsoTrue representative of relationship strength
Group 1 Group 2Group 1 Group 2
MSwg
MSbgF =
MSError
MSModelF =
Model Error
Regression and correlation
Correlation analysis: Statistical significance of relationship
One factor at time (OFAT)
Multiple factor at time (MFAT)
Design of experiment
Traditional method
Statistical method
Design of experiment
Number of factors Screening Optimization Robustness
2-4 Full or fractional factorial
Central composite or Box-Behnken
Taguchi
5 or more Fraction factorial or Plackett Burman
Screen first to reduce factors Taguchi
How to select a design?
Design of experiment
Continuous
Categorical
Independent variable/s
Numeric: any value between lower and upper value
eg. Temperature, pH, concentration
Numeric/non-numeric : only characters or levelseg. Gender, operator, type, temperature
Range of a factor/s -1 (lower) +1 (higher)0 (middle)
Dependent variable/s: Response
Main effect/s Effect/s due to individual factor/s
Interaction effect/s Effect/s due to interaction of multiple factors
When two or more effects can not be distinguished
eg. Main effect is confounded with interaction effects Main effects and interaction effects are aliased
Design of experiment- terminology
Factors
Levels
Effects
Confounding/Aliasing
Resolution type
Order of interaction effects confounded with main effect
Experiment type
III 2 (eg. A with A.B or A.C or B.C etc) Screening
IV 3 (eg. A with ABC) Optimization
V 4 (eg A with ABCD) Optimization
Higher order interaction are less significant than lower order interaction
Design of experiment
Resolution of a design Power of a design
Full factorial: Lf
Level
Factor
No. of Levels No. of Factors Design type Number of experiments
2 2 22 2x2=4
2 3 23 2x2x2=8
3 2 32 3x3=9
3 3 33 3x3x3=27
Design of experiment
Factorial design
22
4 experiments
Design of experiment
Factorial design
ab
a
cb
8 experiments
23
Design of experiment
Factorial design
9 experiments
32
Design of experiment
Factorial design
ab
27 experiments
33
Design of experiment
Factorial design
cb
23
8 experiments
23-1
4 experiments
Design of experiment
Fractional Factorial design
Design of experiment
Response surface methodology
12 experiments
Box - Behnken
Design of experiment
Geometry of some important response surface designs
eg. 3 factor 3 level
Central composite design
Design of experiment
eg. 2 factor 2level
+ =
Geometry of some important response surface designs
Taguchi design
Inner array:
Outer array:
Controllable variables during production
Uncontrollable variables during production
Signal
Noise
Media, pH, feed rate
Temp, DO,
Design of experiment
Geometry of some important response surface designs