Applied Statistics And Doe Mayank

Applied Statistics and DOE

Mayank

MeanMedianMode

Measures of dispersion (spread of data)

VarianceStandard deviationCoefficient of variation

Measures of central tendency (central position of data)

Applied Statistics

Population : µ xSample:

Population : Sample:σ2 s2

Population : Sample:σ s

Median

Measures of Central tendency

Data: 34, 43, 81, 106, 106 and 115

Average Σx/n = 80.83

Highest frequency = 106

Middle score (81+106)/2 = 93.5

Variance:

Most of the data lies between 44.5±4,57 = 39to 49

Standard deviation:

44503849424740394650

-0.55.5-6.54.5-2.52.5-4.5-5.51.55.5

0.330.342.320.36.36.3

20.330.32.3

2)( xx

2)( SS

SS/(n-1) MS

sd√MS

Measures of dispersion

Coefficient of VarianceCV = S/ *100%

4.57/44.5*100% = 10.28%

Standard deviation is 10.28% of the mean

Grade ScoreGenius 145Gifted 130-144

Above average 115-129Higher average 100-114Lower average 85-99Below average 70-84Borderline low 55-69

Low <55

Normal Distribution

Example: IQ Score

115 130 145100857055 145<<55

IQ ScoreNormal Distribution

34.13%34.13%

13.59%13.59 %

2.14%2.14%0.13% 0.13%

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from

0.0031% 0.000028%0.0031%0.000028%

Normal Distribution

μ68.2689%95.4499%99.7300%99.9936%99.999942669%99.999999802 %

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σSd from6σ μ

Normal Distribution

99.999999802 %

0.0000001980.00198

USLLSL

DPMODPHOSix Sigma

Normal Distribution

USLLSL USLLSL

Normal Distribution

USLLSL

-6σ -5σ -4σ -3σ -2σ -1σ 1σ 2σ 3σ 4σ 5σ 6σ μ

1.5 σ

3.4 DMPO

Statistical significance tests

Significance tests

Z- test

t- test

F- test

+ve z: values are above the mean,

-ve z: values are below the mean

1 point compared to population Group compared to population

Population

Z - test Z-value :

How many standard deviations away from mean?

Sample

20.262.19

So this person has a BMI 1.07 standard deviations below the mean

What is the probability that of a person having BMI

19.2 sd below the mean

19.2 sd above the mean

Z - test

Mean ( ) = 26.20Standard deviation (s) = 6.57

Sample :

A person with a BMI of 19.2 has a z score of:

-1σ μ

<19.6 >19.6

Standard deviationZ score

Z - test Sample :

84 %16 %

Test group : Employee having two wheelerTest : Commuting time from home to BioconClaim : Average commuting time is less than 24 min

Samples : 30

18 16 23 19 25 48 13 17 20 23

16 21 18 16 29 15 8 19 20 7

15 16 24 15 6 11 14 23 18 12

At 0.01 level of significance (α=0.01):Is there enough evidence to support the research claim???

Z - test Population :

Assumption: Population is normally distributed

X24Mean

Hypothesis testing

Null hypothesis : H0

Alternate hypothesis : H1

Comparison of means:

H1 : x < µ

H0 : x ≥ µ

No difference (Claim not true)

µ = 24

It is different (Claim is true)

Test vs Population

Z value Z0Criticalvalue

Level of significanceα = 0.01

24Mean X

Z-2.33

Rejection region

= 18.2s = 7.7x

Z = - 4.13

Acceptance region

µ = 24n = 30

Ztest< Zcritical Ztest>Zcritical

Rejection region

-2.33- 4.13

H0 : s ≥ 24 Rejected

So is test value is significantly different (lower) than the mean

Yes: There are significant evidence to reject the null hypothesis

and therefore accept the claim

H1 : s < 24 Significantly supported

t - test Comparison of means between two groups

ttest > tcritical Null hypothesis will be rejected

ttest < tcriticalNull hypothesis will not be rejected

t = Signal

Difference between group means

Variability of groups=

t - test Comparison of means between two groups

35 240 2712 3815 3121 1114 1946 1110 3428 1048 1116 1230 1532 2248 1131 1222 1212 1239 2919 3725 2

Fertilizer w/o Fertilizer

x 27.15 17.9

156.45 122.61s2

t test = 27.15 – 17.9

61.122

45.156

t critical with 38 dfat 0.05 significance level= 2.03

ttest > tcritical

Rejected

1xSo is significantly different from 2x

t - test Case 1 Effect of fertilizer on plant height

df = 2n-2

2 227 2738 3831 31

100 11115 1911 1134 3410 1011 1112 1215 1522 2211 1112 1212 1212 1229 2937 372 2

Fertilizer w/o Fertilizer

x 27.15 17.9

880.1 122.61s2

t test = 1.3

t critical =2.03

ttest < tcritical

H1: 21 xx 21 xx

Not rejected

1xSo is not significantly different from 2x

t - test

Rejected

Case 2

t - test Overview

Comparison of variances

F = where and are the sample variances

The F hypothesis test is defined as:

If Ftest > Fcritical (at significant level)

Rejected

F - test

ANalysis Of VAriance

One way :

Two way :

• Effect of one factor (variable)

• Effect of two factors (variables)

• Effect of interaction

Strategy:

F = MSbg

Compare variability within group MSwg to between groups MSbg

Between groups Within groups

Group 1 Group 2 Group 1 Group 2

One way ANOVA

Factor ( Independent Variable): Temperature (cold, optimum, hot)

Effect ( Dependent Variable): Score (marks obtained)

Null hypothesis (H0) : No effect (µ1= µ2 = µ3)

Alternate hypothesis (H1) : There is an effect (µ1 ≠ µ2 ≠ µ3)

Is there any impact of exam room temperature on student performance?

One way ANOVA

SSbg 748.44n x ( + + )

63.75 71.75 61

Cold Opt Hot7714

77681814

11466811231428

366416

144490

638 768 524

374.25

3.06 39.06 20.25

SSbg/df

C O HN

r of A

3.06 39.06 20.25 =

SSM SSW SSS

= 1930

SSwg/dfMSwg=

(df = 3-1 = 2) (df = (12x3)-3 = 33)

2)( xx

3/x = X̄�2)( xx

One way ANOVA

=374.25

58.5=F =

Fcritical for

Numerator degrees of freedom : 2Denominator degrees of freedom : 33 At significance level (α) : 0.05

= 4.17

Ftest > Fcritical

So there are enough evidence to reject null hypothesis

At 95% confidence level we can say:

That the variation between means is not just by chance

H0: All means are same (no effect of Temperature) Rejected

Examination Room temperature matters significantly

One way ANOVA

Factors ( Independent Variable): 1) Gender:

Effect ( Dependent Variable): 1) Number of participants

Relative impact of gender or type of sprot?

Null hypothesis (H0a) : No effect of gender

Alternate hypothesis (H1) : There is an effect

2) Type of sport

Any interaction between gender and type of sport?

Null hypothesis (H0b) : No effect of type of sportNull hypothesis (H0c) : No interaction

Two way ANOVA

Man Woman

Indoor Outdoor

30, 40, 50 60, 70, 80

140, 150, 160 5, 10, 15

Man Woman

Indoor

Outdoor

Source Df SS MS F

Gender g-1 SSG MSG MSG /Mswithin

Sports s-1 SSs MSs MSs /Mswithin

G x S (g-1)(s-1) SSG x s MSG x s MSG x s /MSwithin

Within (k-1) x I x j SSwithin MSwithin

Source Df SS MS F Fcritical (α=0.01)

Gender 1 9600 9600 118.15 11.3

Sport 1 1875 1875 23.07 11.3

G x S 1 21675 2165 266.75 11.3

Within 8 81.25 231.25

g→ s↓

Two way ANOVA

Woman Man

Ind 70 50

Otd 10 150

ManWoman

Indoor Outdoor

Null hypothesis (H0a) : No effect of gender Rejected

Null hypothesis (H0b) : No effect of type of sports Rejected

Null hypothesis (H0c) : No interaction Rejected

Two way ANOVA

30o C 35o C

pH7 70 50

pH5 10 150

Two way ANOVA

pH 5pH 7

Factors ( Independent Variable): 1) Temperature:

Effect ( Dependent Variable): 1) Total product (g)

Investigation of relationship between variables

X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3

0 5 10 15 20 25 30 35 40 45 50X

Regression and correlation

Regression analysis:

Investigation of relationship between variables

X Y2 4819 3034 17.540 118 4112 4220 3520 3137 1819 3530 1646 8.3

0 5 10 15 20 25 30 35 40 45 50X

R² = 0.955

y = -0.951x + 50.49 y = ax +b

Simple linear regression

One independent variable

y = ax + b

y = a1x1 + a2x2 + a3x3 + b

Simple linear regression

Multiple linear regression

Linear Non Linear

Non linear

y = a1x1 + a2x2 + a11 x2 + a12 x1x2 +b

Is the relationship we have described statistically significant?-Significant tests

To find how well (or badly) a line fits the observation

What is the strength of this relationship- r2 (coefficient of determination) or adjusted r2

Correlation analysis:

ŷ = ax + b

slope intercept

= ŷ, predicted value

ε = residual error =

= y i , true value

y - ŷ

A and b values are calculated that minimize Sum of Squares (SS) of residuals =Σ (y – ŷ)2 : minimum

Total Error

SSTotal

SSErrorr2 = 1-

SSTotal/(n-1)

SSError/(n-p-1)Adjusted r2= 1-

n= total observationp= Number of predictor

(yi – y)2 (y – ŷ)2

r2 : Coefficient of determination

Always between 0 and 1Increase with number of predictor

It can be negative alsoTrue representative of relationship strength

Group 1 Group 2Group 1 Group 2

MSbgF =

MSError

MSModelF =

Model Error

Correlation analysis: Statistical significance of relationship

One factor at time (OFAT)

Multiple factor at time (MFAT)

Design of experiment

Traditional method

Statistical method

Number of factors Screening Optimization Robustness

2-4 Full or fractional factorial

Central composite or Box-Behnken

Taguchi

5 or more Fraction factorial or Plackett Burman

Screen first to reduce factors Taguchi

How to select a design?

Continuous

Categorical

Independent variable/s

Numeric: any value between lower and upper value

eg. Temperature, pH, concentration

Numeric/non-numeric : only characters or levelseg. Gender, operator, type, temperature

Range of a factor/s -1 (lower) +1 (higher)0 (middle)

Dependent variable/s: Response

Main effect/s Effect/s due to individual factor/s

Interaction effect/s Effect/s due to interaction of multiple factors

When two or more effects can not be distinguished

eg. Main effect is confounded with interaction effects Main effects and interaction effects are aliased

Design of experiment- terminology

Factors

Levels

Effects

Confounding/Aliasing

Resolution type

Order of interaction effects confounded with main effect

Experiment type

III 2 (eg. A with A.B or A.C or B.C etc) Screening

IV 3 (eg. A with ABC) Optimization

V 4 (eg A with ABCD) Optimization

Higher order interaction are less significant than lower order interaction

Resolution of a design Power of a design

Full factorial: Lf

Factor

No. of Levels No. of Factors Design type Number of experiments

2 2 22 2x2=4

2 3 23 2x2x2=8

3 2 32 3x3=9

3 3 33 3x3x3=27

Factorial design

4 experiments

Factorial design

8 experiments

Factorial design

9 experiments

Factorial design

27 experiments

Factorial design

8 experiments

4 experiments

Fractional Factorial design

Response surface methodology

12 experiments

Box - Behnken

Geometry of some important response surface designs

eg. 3 factor 3 level

Central composite design

eg. 2 factor 2level

Taguchi design

Inner array:

Outer array:

Controllable variables during production

Uncontrollable variables during production

Signal

Media, pH, feed rate

Temp, DO,

Applied Statistics And Doe Mayank

Documents

Transcript of Applied Statistics And Doe Mayank

Applied Statistics - Τμήμα Μαθηματικών & …vagelis/Courses/EM272/Stats_M.Hillen...Chapter 1 Probability Notation Notation: Probability of A = P(A) : P(A) ≥ 0, X

Statistics exercise solution

Applied Statistics I - Department of Mathematicspubudu/app8.pdf · Applied Statistics I ... 2 The number of students in Applied Statistics lecture. 3 The height of students in Faculty

Descriptive and Inferential Statistics Descriptive statistics The science of describing distributions of samples or populations Inferential statistics.

Applied Statistics I - Department of Mathematicspubudu/applied6.pdf · Department of Mathematics University of Ruhuna Applied Statistics I(IMT224 /AMT224 ) Outline Notation The following

Statistics 101

Statistics Consulting Cheat Sheet - statweb.stanford.edustatweb.stanford.edu/~kriss1/statistics-consulting-cheat.pdf · Statistics Consulting Cheat Sheet Kris Sankaran October 1,

Applied Statistics - ΣΣ Statistical · PDF fileApplied Statistics James Reilly - 3 - Table of Contents 1. Observing Data 1A. ... 8C. Acceptance Sampling

Introduction to Statistics

DOE Process Optimization[1]

Ecological Statistics

Summary Boltzman statistics: Fermi-Dirac statistics: Bose-Einstein statistics: Maxwell-Boltzmann statistics: Problem 13-4: Show that for a system of N.

Applied statistics lecture_4

Jets and Missing Et Anwar A Bhatti November 28, 2006 DOE Site Visit.

DOE Review (Arlington TX, November 9, 2010)

Statistics in Research

Tesi Di Laurea Doe

Theory of Statistics - University of Arizonamath.arizona.edu/~jwatkins/notests.pdf · Theory of Statistics ... 1.2 Bayesian Statistics ... 5.2 Classical Decision Theory ...

Multivariate Statistics

Business Statistics