Applied Statistics And Doe Mayank

58
Applied Statistics and DOE Mayank

description

Applied statistics required to understand basics of design of experiments

Transcript of Applied Statistics And Doe Mayank

Page 1: Applied Statistics And Doe Mayank

Applied Statistics and DOE

Mayank

Page 2: Applied Statistics And Doe Mayank

MeanMedianMode

Measures of dispersion (spread of data)

VarianceStandard deviationCoefficient of variation

Measures of central tendency (central position of data)

Applied Statistics

Population : µ xSample:

Population : Sample:σ2 s2

Population : Sample:σ s

Page 3: Applied Statistics And Doe Mayank

Mean

Mode

Median

Measures of Central tendency

Data: 34, 43, 81, 106, 106 and 115

Average Σx/n = 80.83

Highest frequency = 106

Middle score (81+106)/2 = 93.5

Page 4: Applied Statistics And Doe Mayank

Variance:

Most of the data lies between 44.5±4,57 = 39to 49

Standard deviation:

44503849424740394650

188.5

44.5

-0.55.5-6.54.5-2.52.5-4.5-5.51.55.5

0.330.342.320.36.36.3

20.330.32.3

30.3

20.9

4.57

x

2)( xx

n

ii xx

1

2)( SS

SS/(n-1) MS

sd√MS

)( xx

Measures of dispersion

x

Page 5: Applied Statistics And Doe Mayank

Coefficient of VarianceCV = S/ *100%

4.57/44.5*100% = 10.28%

Standard deviation is 10.28% of the mean

Measures of dispersion

Page 6: Applied Statistics And Doe Mayank

Grade ScoreGenius 145Gifted 130-144

Above average 115-129Higher average 100-114Lower average 85-99Below average 70-84Borderline low 55-69

Low <55

Normal Distribution

Example: IQ Score

Measures of dispersion

Page 7: Applied Statistics And Doe Mayank

115 130 145100857055 145<<55

Coun

t

Score

IQ ScoreNormal Distribution

Measures of dispersion

Page 8: Applied Statistics And Doe Mayank

34.13%34.13%

13.59%13.59 %

2.14%2.14%0.13% 0.13%

Prob

abili

ty

Score

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from

0.0031% 0.000028%0.0031%0.000028%

Normal Distribution

Measures of dispersion

μ68.2689%95.4499%99.7300%99.9936%99.999942669%99.999999802 %

Page 9: Applied Statistics And Doe Mayank

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from6σ μ

Normal Distribution

Measures of dispersion

99.999999802 %

0.0000001980.00198

USLLSL

DPMODPHOSix Sigma

Page 10: Applied Statistics And Doe Mayank

Measures of dispersion

Normal Distribution

USLLSL USLLSL

Page 11: Applied Statistics And Doe Mayank

Normal Distribution

Measures of dispersion

USLLSL

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σ 6σ μ

1.5 σ

3.4 DMPO

Page 12: Applied Statistics And Doe Mayank

Statistical significance tests

Significance tests

Z- test

t- test

F- test

ANOVA

Page 13: Applied Statistics And Doe Mayank

+ve z: values are above the mean,

-ve z: values are below the mean

1 point compared to population Group compared to population

Population

ii

xz

n

xz

Statistical significance tests

Z - test Z-value :

How many standard deviations away from mean?

s

xxz

Sample

Page 14: Applied Statistics And Doe Mayank

s

xxz

07.1

57.6

20.262.19

So this person has a BMI 1.07 standard deviations below the mean

What is the probability that of a person having BMI

19.2 sd below the mean

19.2 sd above the mean

Statistical significance tests

Z - test

Mean ( ) = 26.20Standard deviation (s) = 6.57

x

Sample :

BMI

A person with a BMI of 19.2 has a z score of:

Page 15: Applied Statistics And Doe Mayank

Prob

abili

ty

Sd

-1σ μ

<19.6 >19.6

0

Standard deviationZ score

Statistical significance tests

Z - test Sample :

-1

84 %16 %

Page 16: Applied Statistics And Doe Mayank

Test group : Employee having two wheelerTest : Commuting time from home to BioconClaim : Average commuting time is less than 24 min

Samples : 30

18 16 23 19 25 48 13 17 20 23

16 21 18 16 29 15 8 19 20 7

15 16 24 15 6 11 14 23 18 12

At 0.01 level of significance (α=0.01):Is there enough evidence to support the research claim???

Statistical significance tests

Z - test Population :

Page 17: Applied Statistics And Doe Mayank

Statistical significance tests

Z - test Population :

Assumption: Population is normally distributed

X24Mean

Prob

abili

ty

Score

Page 18: Applied Statistics And Doe Mayank

Hypothesis testing

Null hypothesis : H0

Alternate hypothesis : H1

Comparison of means:

H1 : x < µ

H0 : x ≥ µ

Statistical significance tests

Z - test Population :

No difference (Claim not true)

µ = 24

It is different (Claim is true)

Test vs Population

Page 19: Applied Statistics And Doe Mayank

Prob

abili

ty

Z value Z0Criticalvalue

Level of significanceα = 0.01

24Mean X

Prob

abili

ty

Score

Statistical significance tests

Z - test Population :

-2.33

Page 20: Applied Statistics And Doe Mayank

nsx

z

Z-2.33

Rejection region

= 18.2s = 7.7x

Z = - 4.13

Acceptance region

Statistical significance tests

Z - test Population :

µ = 24n = 30

Ztest< Zcritical Ztest>Zcritical

Page 21: Applied Statistics And Doe Mayank

Z

Rejection region

-2.33- 4.13

H0 : s ≥ 24 Rejected

So is test value is significantly different (lower) than the mean

Yes: There are significant evidence to reject the null hypothesis

and therefore accept the claim

H1 : s < 24 Significantly supported

Statistical significance tests

Z - test Population :

Page 22: Applied Statistics And Doe Mayank

H0:

H1:

Statistical significance tests

t - test Comparison of means between two groups

ttest > tcritical Null hypothesis will be rejected

ttest < tcriticalNull hypothesis will not be rejected

Page 23: Applied Statistics And Doe Mayank

t = Signal

Noise

Difference between group means

Variability of groups=

21

21

xxs

xxt

2

22

1

21

21 n

s

n

ss xx

Statistical significance tests

t - test Comparison of means between two groups

Page 24: Applied Statistics And Doe Mayank

35 240 2712 3815 3121 1114 1946 1110 3428 1048 1116 1230 1532 2248 1131 1222 1212 1239 2919 3725 2

Fertilizer w/o Fertilizer

x 27.15 17.9

156.45 122.61s2

t test = 27.15 – 17.9

20

61.122

20

45.156

= 2.4

t critical with 38 dfat 0.05 significance level= 2.03

ttest > tcritical

H0:

H1:

21 xx

21 xx

Rejected

1xSo is significantly different from 2x

Plan

t hei

ght

Statistical significance tests

t - test Case 1 Effect of fertilizer on plant height

df = 2n-2

Page 25: Applied Statistics And Doe Mayank

2 227 2738 3831 31

100 11115 1911 1134 3410 1011 1112 1215 1522 2211 1112 1212 1212 1229 2937 372 2

Fertilizer w/o Fertilizer

x 27.15 17.9

880.1 122.61s2

t test = 1.3

t critical =2.03

ttest < tcritical

H0:

H1: 21 xx 21 xx

Not rejected

1xSo is not significantly different from 2x

Plan

t hei

ght

Statistical significance tests

t - test

Rejected

Case 2

Page 26: Applied Statistics And Doe Mayank

Statistical significance tests

t - test Overview

Page 27: Applied Statistics And Doe Mayank

Comparison of variances

F = where and are the sample variances

The F hypothesis test is defined as:

H0: =

If Ftest > Fcritical (at significant level)

Rejected

Statistical significance tests

F - test

Ha: <

>

Page 28: Applied Statistics And Doe Mayank

ANalysis Of VAriance

One way :

Two way :

• Effect of one factor (variable)

• Effect of two factors (variables)

• Effect of interaction

Statistical significance tests

ANOVA

Page 29: Applied Statistics And Doe Mayank

Strategy:

F = MSbg

MSwg

Compare variability within group MSwg to between groups MSbg

Between groups Within groups

Group 1 Group 2 Group 1 Group 2

Statistical significance tests

One way ANOVA

Page 30: Applied Statistics And Doe Mayank

Factor ( Independent Variable): Temperature (cold, optimum, hot)

Effect ( Dependent Variable): Score (marks obtained)

Null hypothesis (H0) : No effect (µ1= µ2 = µ3)

Alternate hypothesis (H1) : There is an effect (µ1 ≠ µ2 ≠ µ3)

Is there any impact of exam room temperature on student performance?

Statistical significance tests

One way ANOVA

Page 31: Applied Statistics And Doe Mayank

SSbg 748.44n x ( + + )

55

60

51

65

72

65

55

72

68

60

75

67

75

65

80

75

67

68

77

83

67

56

65

83

67

53

65

49

54

61

65

72

63

64

54

65

63.75 71.75 61

65.5

Cold Opt Hot7714

1632

682

77681814

12711

11466811231428

12723

24846

127

366416

144490

16121

49

4916

638 768 524

374.25

3.06 39.06 20.25

SSbg/df

58.5

C O HN

umbe

r of A

tten

dees

3.06 39.06 20.25 =

MSbg=

SSM SSW SSS

SSwg

+ +

= 1930

SSwg/dfMSwg=

SS

= =

(df = 3-1 = 2) (df = (12x3)-3 = 33)

2)( xx

x

3/x = X̄�2)( xx

Statistical significance tests

One way ANOVA

Page 32: Applied Statistics And Doe Mayank

=374.25

58.5=F =

MSbg

MSwg

Fcritical for

Numerator degrees of freedom : 2Denominator degrees of freedom : 33 At significance level (α) : 0.05

= 4.17

Ftest > Fcritical

So there are enough evidence to reject null hypothesis

At 95% confidence level we can say:

That the variation between means is not just by chance

6.40

H0: All means are same (no effect of Temperature) Rejected

Examination Room temperature matters significantly

Statistical significance tests

One way ANOVA

Page 33: Applied Statistics And Doe Mayank

Factors ( Independent Variable): 1) Gender:

Effect ( Dependent Variable): 1) Number of participants

Relative impact of gender or type of sprot?

Null hypothesis (H0a) : No effect of gender

Alternate hypothesis (H1) : There is an effect

2) Type of sport

Any interaction between gender and type of sport?

Null hypothesis (H0b) : No effect of type of sportNull hypothesis (H0c) : No interaction

Statistical significance tests

Two way ANOVA

Man Woman

Indoor Outdoor

Page 34: Applied Statistics And Doe Mayank

30, 40, 50 60, 70, 80

140, 150, 160 5, 10, 15

Man Woman

Indoor

Outdoor

Source Df SS MS F

Gender g-1 SSG MSG MSG /Mswithin

Sports s-1 SSs MSs MSs /Mswithin

G x S (g-1)(s-1) SSG x s MSG x s MSG x s /MSwithin

Within (k-1) x I x j SSwithin MSwithin

Source Df SS MS F Fcritical (α=0.01)

Gender 1 9600 9600 118.15 11.3

Sport 1 1875 1875 23.07 11.3

G x S 1 21675 2165 266.75 11.3

Within 8 81.25 231.25

g→ s↓

Statistical significance tests

Two way ANOVA

Page 35: Applied Statistics And Doe Mayank

Woman Man

Ind 70 50

Otd 10 150

0

20

40

60

80

100

120

140

160

ManWoman

Indoor Outdoor

Null hypothesis (H0a) : No effect of gender Rejected

Null hypothesis (H0b) : No effect of type of sports Rejected

Null hypothesis (H0c) : No interaction Rejected

Statistical significance tests

Two way ANOVA

Page 36: Applied Statistics And Doe Mayank

0

20

40

60

80

100

120

140

160

30o C 35o C

30o C 35o C

pH7 70 50

pH5 10 150

Statistical significance tests

Two way ANOVA

pH 5pH 7

Factors ( Independent Variable): 1) Temperature:

Effect ( Dependent Variable): 1) Total product (g)

2) pH

30 35

5 7

Page 37: Applied Statistics And Doe Mayank

Investigation of relationship between variables

X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3

0 5 10 15 20 25 30 35 40 45 50X

Y

Regression and correlation

Regression analysis:

Page 38: Applied Statistics And Doe Mayank

Investigation of relationship between variables

X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3

0 5 10 15 20 25 30 35 40 45 50X

Y

R² = 0.955

y = -0.951x + 50.49 y = ax +b

Simple linear regression

One independent variable

Regression and correlation

Regression analysis:

Page 39: Applied Statistics And Doe Mayank

y = ax + b

y = a1x1 + a2x2 + a3x3 + b

Simple linear regression

Multiple linear regression

Linear Non Linear

Regression and correlation

Regression analysis:

Non linear

y = a1x1 + a2x2 + a11 x2 + a12 x1x2 +b

Page 40: Applied Statistics And Doe Mayank

Is the relationship we have described statistically significant?-Significant tests

To find how well (or badly) a line fits the observation

What is the strength of this relationship- r2 (coefficient of determination) or adjusted r2

Regression and correlation

Correlation analysis:

Page 41: Applied Statistics And Doe Mayank

ε

ŷ = ax + b

slope intercept

= ŷ, predicted value

ε = residual error =

= y i , true value

y - ŷ

A and b values are calculated that minimize Sum of Squares (SS) of residuals =Σ (y – ŷ)2 : minimum

Regression and correlation

Correlation analysis:

Page 42: Applied Statistics And Doe Mayank

Total Error

SSTotal

SSErrorr2 = 1-

Regression and correlation

Correlation analysis:

SSTotal/(n-1)

SSError/(n-p-1)Adjusted r2= 1-

n= total observationp= Number of predictor

(yi – y)2 (y – ŷ)2

r2 : Coefficient of determination

Always between 0 and 1Increase with number of predictor

It can be negative alsoTrue representative of relationship strength

Page 43: Applied Statistics And Doe Mayank

Group 1 Group 2Group 1 Group 2

MSwg

MSbgF =

MSError

MSModelF =

Model Error

Regression and correlation

Correlation analysis: Statistical significance of relationship

Page 44: Applied Statistics And Doe Mayank

One factor at time (OFAT)

Multiple factor at time (MFAT)

Design of experiment

Traditional method

Statistical method

Page 45: Applied Statistics And Doe Mayank

Design of experiment

Page 46: Applied Statistics And Doe Mayank

Number of factors Screening Optimization Robustness

2-4 Full or fractional factorial

Central composite or Box-Behnken

Taguchi

5 or more Fraction factorial or Plackett Burman

Screen first to reduce factors Taguchi

How to select a design?

Design of experiment

Page 47: Applied Statistics And Doe Mayank

Continuous

Categorical

Independent variable/s

Numeric: any value between lower and upper value

eg. Temperature, pH, concentration

Numeric/non-numeric : only characters or levelseg. Gender, operator, type, temperature

Range of a factor/s -1 (lower) +1 (higher)0 (middle)

Dependent variable/s: Response

Main effect/s Effect/s due to individual factor/s

Interaction effect/s Effect/s due to interaction of multiple factors

When two or more effects can not be distinguished

eg. Main effect is confounded with interaction effects Main effects and interaction effects are aliased

Design of experiment- terminology

Factors

Levels

Effects

Confounding/Aliasing

Page 48: Applied Statistics And Doe Mayank

Resolution type

Order of interaction effects confounded with main effect

Experiment type

III 2 (eg. A with A.B or A.C or B.C etc) Screening

IV 3 (eg. A with ABC) Optimization

V 4 (eg A with ABCD) Optimization

Higher order interaction are less significant than lower order interaction

Design of experiment

Resolution of a design Power of a design

Page 49: Applied Statistics And Doe Mayank

Full factorial: Lf

Level

Factor

No. of Levels No. of Factors Design type Number of experiments

2 2 22 2x2=4

2 3 23 2x2x2=8

3 2 32 3x3=9

3 3 33 3x3x3=27

Design of experiment

Factorial design

Page 50: Applied Statistics And Doe Mayank

22

4 experiments

Design of experiment

Factorial design

ab

Page 51: Applied Statistics And Doe Mayank

a

cb

8 experiments

23

Design of experiment

Factorial design

Page 52: Applied Statistics And Doe Mayank

9 experiments

32

Design of experiment

Factorial design

ab

Page 53: Applied Statistics And Doe Mayank

27 experiments

33

Design of experiment

Factorial design

cb

Page 54: Applied Statistics And Doe Mayank

23

8 experiments

23-1

4 experiments

Design of experiment

Fractional Factorial design

Page 55: Applied Statistics And Doe Mayank

Design of experiment

Response surface methodology

Page 56: Applied Statistics And Doe Mayank

12 experiments

Box - Behnken

Design of experiment

Geometry of some important response surface designs

eg. 3 factor 3 level

Page 57: Applied Statistics And Doe Mayank

Central composite design

Design of experiment

eg. 2 factor 2level

+ =

Geometry of some important response surface designs

Page 58: Applied Statistics And Doe Mayank

Taguchi design

Inner array:

Outer array:

Controllable variables during production

Uncontrollable variables during production

Signal

Noise

Media, pH, feed rate

Temp, DO,

Design of experiment

Geometry of some important response surface designs