Numerical Methods for Civil Engineers Lecture 7 Curve...

Post on 26-Jul-2020

27 views 0 download

Transcript of Numerical Methods for Civil Engineers Lecture 7 Curve...

� Linear Regression

� Polynomial Regression

� Multiple Linear Regression

Numerical Methods for Numerical Methods for CivilCivil EngineersEngineers

Mongkol JIRAVACHARADET

S U R A N A R E E INSTITUTE OF ENGINEERING

UNIVERSITY OF TECHNOLOGY SCHOOL OF CIVIL ENGINEERING

Lecture Lecture 77 Curve FittingCurve Fitting

LINEAR REGRESSION

x

y

Candidate lines for curve fit

y = α x +β

No exact solution but many approximated solutions

x

y Observation: [ xi yi ]

Error Between Model and Observation

Model: y = α x + β

Error: ei = yi - α xi + β

Criteria for a “Best” Fit

Find the BEST line which minimize the sum of error for all data

Least-Square Fit of a Straight Line

Minimize sum of the square of the errors

( )∑∑==

−−==n

i

ii

n

i

ir xyeS1

2

1

2 αβ

Differentiate with respect to each coefficient:

])[(2

)(2

iiir

iir

xxyS

xyS

αβα

αββ

−−∑−=∂∂

−−∑−=∂∂

Setting derivatives = 0 :

20

0

iiii

ii

xxxy

xy

αβ

αβ

∑−∑−∑=

∑−∑−∑=

From Σβ = n β , express equations as set of 2 unknowns ( β , α )

iiii

ii

xyxx

yxn

∑=∑+∑

∑=∑+

2αβ

αβ

Solve equations simultaneously:

( )22

1

1

i i i i

i i

x y x yn

x xn

α∑ − ∑ ∑

=∑ − ∑

y xβ α= −

xyxy and of mean the are and where

1xy i i i iS x y x y

n= Σ − Σ ΣDefine:

( )22 1xx i iS x x

n= Σ − Σ

( )22 1yy i iS y y

n= Σ − Σ

y xα β= +

Approximated y for any x is

xy

xx

S

Sα =

Example: Fit a straight line to x and y values

xi yi

1

2

3

4

5

6

7

0.5

2.5

2.0

4.0

3.5

6.0

5.5 4286.37

24

47

28

7

==

==

=

y

x

n

2

(119.5) (28)(24) / 70.8393

(140) (28) / 7

3.4286 0.8393(4) 0.0714

α

β

−= =

= − =0714.08393.0 += xy

Least-square fit:

xi2 xi yi

1

4

9

16

25

36

49

0.5

5.0

6.0

16.0

17.5

36.0

38.5

yi2

0.25

6.25

4

16

12.25

36

30.25

Σ 28 24 140 119.5 105

How good is our fit?

Sum of the square of the errors:

( )22

1 1

n n

r i i i

i i

S e y xβ α= =

= = − −∑ ∑

Standard errors of the estimation:/

2

ry x

Ss

n=

Sum of the square around the mean: ( )21

n

t i

i

S y y=

= −∑

Standard deviation:2

ty

Ss

n=

( )2 /xx yy xy xxS S S S= −

yyS=

Coefficient of determination 2 t r

t

S Sr

S

−=

For perfect fit Sr = 0 and r = r2 = 1

y

sy sy/x

Linear regression

sy > sy/x

2

xy

xx yy

S

S S=

r2 ������������� ����������� y ���������������������� x

Example: error analysis of the linear fit

xi yi

1

2

3

4

5

6

7

0.5

2.5

2.0

4.0

3.5

6.0

5.5

0.8393

0.0714

αβ=

=

3.4286y =

/

22.7143

7 2

2.131

2.9911

7 2

0.773

y

y x

s

s

=−

=

=−

=

( yi - β - α xi)2

8.5765

0.8622

2.0408

0.3265

0.0051

6.6122

4.2908

0.1687

0.5626

0.3473

0.3265

0.5896

0.7972

0.1993

Σ 28 24 22.7143 2.9911

( )2iy y−

St SrSince sy/x < sy , linear regression has merit.

22.7143 2.99110.868 0.932

22.7143r

−= = =

Linear model explains 86.8% of original uncertainty.

xi yi

1

2

3

4

5

6

7

0.5

2.5

2.0

4.0

3.5

6.0

5.5

xi2 xi yi

1

4

9

16

25

36

49

0.5

5.0

6.0

16.0

17.5

36.0

38.5

yi2

0.25

6.25

4

16

12.25

36

30.25

Σ 28 24 140 119.5 105

OR Example: error analysis of the linear fit

2140 28 / 7 28xxS = − =

2105 24 / 7 22.7yyS = − =

119.5 28 24 / 7 23.5xyS = − × =

2(28 22.7 23.5 ) / 28

2.977

rS = × −

=

/

22.72.131

7 2

2.9770.772

7 2

y

y x

s

s

= =−

= =−

Since sy/x < sy , linear regression has merit.

22 23.5

0.86928 22.7

r = =×

Linear model explains 86.9% of original uncertainty.

Confidence Interval (CI)

A confidence interval is an interval in which a measurement or trial

falls corresponding to a given probability.

x

y

xi

ˆi

y

ˆiy ± ∆

x

y

For CI 95%, you can be 95% confident that the two curved

confidence bands enclose the true best-fit linear regression line,

leaving a 5% chance that the true line is outside those boundaries.

2

/ 2 /

( )1ˆ i

i y x

xx

x xy t s

n Sα

−± +

A 100 (1 - α) % confidence interval for yi is given by

Confidence interval 95% → α = 0.05

xi yi

1

2

3

4

5

6

7

0.5

2.5

2.0

4.0

3.5

6.0

5.5

Example: to estimate y when x is 3.4 using 95% confidence interval:

ˆ 0.8363(3.4) 0.0714 2.9148y xα β= + = + =

95% Confidence → α = 0.05 → tα/2 = t0.025(df = n-2 = 5) = 2.571

Interval:21 (3.4 4)

2.9148 (2.571) (0.772)7 28

−± +

2.9148 0.7832±

T-Distribution

t

95%

99%

t 0.025 t 0.025

t 0.005 t 0.005

Probability density function of

the t distribution:

where B is the beta function and

ν is a positive integer

shape parameter.

The formula for the beta function is

2 ( 1) / 2(1 / )( )

(0.5,0.5 )

xf x

B

νν

ν ν

− ++=

1

1 1

0

( , ) (1 )B t t dtα βα β − −= −∫

The following is the plot of the t probability density function for 4 different

values of the shape parameter.

In fact, the t distribution with ν equal to 1 is a Cauchy distribution.

The t distribution approaches a normal distribution as ν becomes large.

The approximation is quite good for values of ν > 30.

ν = df

Degree of freedom

Critical Values of t

80% 90% 95% 98% 99% 99.8%

df 0.10 0.05 0.025 0.01 0.005 0.001

1 3.078 6.314 12.706 31.821 63.657 318.313

2 1.886 2.920 4.303 6.965 9.925 22.327

3 1.638 2.353 3.182 4.541 5.841 10.215

4 1.533 2.132 2.776 3.747 4.604 7.173

5 1.476 2.015 2.571 3.365 4.032 5.893

6 1.440 1.943 2.447 3.143 3.707 5.208

7 1.415 1.895 2.365 2.998 3.499 4.782

8 1.397 1.860 2.306 2.896 3.355 4.499

9 1.383 1.833 2.262 2.821 3.250 4.296

10 1.372 1.812 2.228 2.764 3.169 4.143

11 1.363 1.796 2.201 2.718 3.106 4.024

12 1.356 1.782 2.179 2.681 3.055 3.929

13 1.350 1.771 2.160 2.650 3.012 3.852

14 1.345 1.761 2.145 2.624 2.977 3.787

15 1.341 1.753 2.131 2.602 2.947 3.733

16 1.337 1.746 2.120 2.583 2.921 3.686

17 1.333 1.740 2.110 2.567 2.898 3.646

18 1.330 1.734 2.101 2.552 2.878 3.610

19 1.328 1.729 2.093 2.539 2.861 3.579

20 1.325 1.725 2.086 2.528 2.845 3.552

Confidence Interval

80% 90% 95% 98% 99% 99.8%

0.10 0.05 0.025 0.01 0.005 0.001

1.323 1.721 2.080 2.518 2.831 3.527

1.321 1.717 2.074 2.508 2.819 3.505

1.319 1.714 2.069 2.500 2.807 3.485

1.318 1.711 2.064 2.492 2.797 3.467

1.316 1.708 2.060 2.485 2.787 3.450

1.315 1.706 2.056 2.479 2.779 3.435

1.314 1.703 2.052 2.473 2.771 3.421

1.313 1.701 2.048 2.467 2.763 3.408

1.311 1.699 2.045 2.462 2.756 3.396

1.310 1.697 2.042 2.457 2.750 3.385

1.309 1.696 2.040 2.453 2.744 3.375

1.309 1.694 2.037 2.449 2.738 3.365

1.308 1.692 2.035 2.445 2.733 3.356

1.307 1.691 2.032 2.441 2.728 3.348

1.306 1.690 2.030 2.438 2.724 3.340

1.306 1.688 2.028 2.434 2.719 3.333

1.305 1.687 2.026 2.431 2.715 3.326

1.304 1.686 2.024 2.429 2.712 3.319

1.304 1.685 2.023 2.426 2.708 3.313

1.303 1.684 2.021 2.423 2.704 3.307

Confidence Interval

df

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

80% 90% 95% 98% 99% 99.8%

0.10 0.05 0.025 0.01 0.005 0.001

1.303 1.683 2.020 2.421 2.701 3.301

1.302 1.682 2.018 2.418 2.698 3.296

1.302 1.681 2.017 2.416 2.695 3.291

1.301 1.680 2.015 2.414 2.692 3.286

1.301 1.679 2.014 2.412 2.690 3.281

1.300 1.679 2.013 2.410 2.687 3.277

1.300 1.678 2.012 2.408 2.685 3.273

1.299 1.677 2.011 2.407 2.682 3.269

1.299 1.677 2.010 2.405 2.680 3.265

1.299 1.676 2.009 2.403 2.678 3.261

1.298 1.675 2.008 2.402 2.676 3.258

1.298 1.675 2.007 2.400 2.674 3.255

1.298 1.674 2.006 2.399 2.672 3.251

1.297 1.674 2.005 2.397 2.670 3.248

1.297 1.673 2.004 2.396 2.668 3.245

1.297 1.673 2.003 2.395 2.667 3.242

1.297 1.672 2.002 2.394 2.665 3.239

1.296 1.672 2.002 2.392 2.663 3.237

1.296 1.671 2.001 2.391 2.662 3.234

1.296 1.671 2.000 2.390 2.660 3.232

Confidence Interval

df

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

80% 90% 95% 98% 99% 99.8%

0.10 0.05 0.025 0.01 0.005 0.001

1.296 1.670 2.000 2.389 2.659 3.229

1.295 1.670 1.999 2.388 2.657 3.227

1.295 1.669 1.998 2.387 2.656 3.225

1.295 1.669 1.998 2.386 2.655 3.223

1.295 1.669 1.997 2.385 2.654 3.220

1.295 1.668 1.997 2.384 2.652 3.218

1.294 1.668 1.996 2.383 2.651 3.216

1.294 1.668 1.995 2.382 2.650 3.214

1.294 1.667 1.995 2.382 2.649 3.213

1.294 1.667 1.994 2.381 2.648 3.211

1.294 1.667 1.994 2.380 2.647 3.209

1.293 1.666 1.993 2.379 2.646 3.207

1.293 1.666 1.993 2.379 2.645 3.206

1.293 1.666 1.993 2.378 2.644 3.204

1.293 1.665 1.992 2.377 2.643 3.202

1.293 1.665 1.992 2.376 2.642 3.201

1.293 1.665 1.991 2.376 2.641 3.199

1.292 1.665 1.991 2.375 2.640 3.198

1.292 1.664 1.990 2.374 2.640 3.197

1.292 1.664 1.990 2.374 2.639 3.195

Confidence Interval

df

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

80% 90% 95% 98% 99% 99.8%

0.10 0.05 0.025 0.01 0.005 0.001

1.292 1.664 1.990 2.373 2.638 3.194

1.292 1.664 1.989 2.373 2.637 3.193

1.292 1.663 1.989 2.372 2.636 3.191

1.292 1.663 1.989 2.372 2.636 3.190

1.292 1.663 1.988 2.371 2.635 3.189

1.291 1.663 1.988 2.370 2.634 3.188

1.291 1.663 1.988 2.370 2.634 3.187

1.291 1.662 1.987 2.369 2.633 3.185

1.291 1.662 1.987 2.369 2.632 3.184

1.291 1.662 1.987 2.368 2.632 3.183

1.291 1.662 1.986 2.368 2.631 3.182

1.291 1.662 1.986 2.368 2.630 3.181

1.291 1.661 1.986 2.367 2.630 3.180

1.291 1.661 1.986 2.367 2.629 3.179

1.291 1.661 1.985 2.366 2.629 3.178

1.290 1.661 1.985 2.366 2.628 3.177

1.290 1.661 1.985 2.365 2.627 3.176

1.290 1.661 1.984 2.365 2.627 3.175

1.290 1.660 1.984 2.365 2.626 3.175

1.290 1.660 1.984 2.364 2.626 3.174

1.282 1.645 1.960 2.326 2.576 3.090

Confidence Interval

df

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

∞∞∞∞

Polynomial Regression

Second-order polynomial:

y = a0 + a1x + a2 x2

Sum of the squares of the residuals:

22

210 )( iiir xaxaayS −−−∑=

Take derivative with respect to each coefficients:

)(2

)(2

)(2

2

210

2

2

2

210

1

2

210

0

iiiir

iiiir

iiir

xaxaayxa

S

xaxaayxa

S

xaxaaya

S

−−−∑−=∂∂

−−−∑−=∂∂

−−−∑−=∂∂

Normal equations:

( ) ( )( ) ( ) ( )( ) ( ) ( ) iiiii

iiiii

iii

yxaxaxax

yxaxaxax

yaxaxan

2

2

4

1

3

0

2

2

3

1

2

0

2

2

10

∑=∑+∑+∑

∑=∑+∑+∑

∑=∑+∑+

For mth-order polynomial: y = a0 + a1x + a2 x2+ . . . + amx

m

We have to solve m+1 simultaneous linear equations.

MATLAB polyfit Function

For second-order polynomial, we can define

211 1

1222 2

2

32

-1

1

1, ,

1

and show that ( ' ) ' or =

mm m

yx xc

yx xc

cyx x

= = =

= 1

A Y C

C A A A Y C A Y

�� � �

>> C = polyfit(x, y, n)

>> [C, S] = polyfit(x, y, n)

x = independent variable

y = dependent variable

n = degree of polynomial

C = coeff. of polynomial in

descending power

S = data structure for polyval

function

Fit norm Fit QR

Example: Fit a second-order polynomial to the data.

xi yi

0

1

2

3

4

5

2.1

7.7

13.6

27.2

40.9

61.1

( yi - a0 - a1xi - a2xi2)2

544.44

314.47

140.03

3.12

239.22

1272.11

0.14332

1.00286

1.08158

0.80491

0.61951

0.09439

Σ 15 152.6 2513.39 3.74657

( )2iy y−

4

2 2

3

2 15 979

6 152.6 585.6

2.5 55 585.6

25.433 225

i i

i i i

i i i

i

m x x

n y x y

x x x y

y x

= ∑ = ∑ =

= ∑ = ∑ =

= ∑ = ∑ =

= ∑ =

From the given data:

Simultaneous linear equations

0

1

2

6 15 25 152.6

15 55 225 585.6

55 225 979 2488.8

a

a

a

=

Solving these equation gives a0 = 2.47857, a1 = 2.35929, and

a2 = 1.86071.

Least-squares quadratic equation:

y = 2.47857 + 2.35929x + 1.86071x2

Coefficient of determination:

2513.39 3.746570.99851 0.99925

2513.39r

−= = =

Solving by MATLAB polyfit Function

>> x = [0 1 2 3 4 5];

>> y = [2.1 7.7 13.6 27.2 40.9 61.1];

>> c = polyfit(x, y, 2)

>> [c, s] = polyfit(x, y, 2)

>> st = sum((y - mean(y)).^2)

>> sr = sum((y - polyval(c, x)).^2)

>> r = sqrt((st - sr) / st)

MATLAB polyval Function

Evaluate polynomial at the points defined by the input vector

Example: y = 1.86071x2+ 2.35929x + 2.47857

Y = c(1)*xn + c(2)*x(n-1) + ... + c(n)*x + c(n+1)

>> y = polyval(c, x)

where x = Input vector

y = Value of polynomial evaluated at x

c = vector of coefficient in descending order

>> c = [1.86071 2.35929 2.47857]

0 1 2 3 4 50

10

20

30

40

50

60

70

x

y

Polynomial Interpolation

>> y2 = polyval(c,x)

>> plot(x, y, ’o’, x, y2)

Error Bounds

By passing an optional second output parameter from polyfit as an input

to polyval.

Interval of ±2∆ = 95% confidence interval

>> [y2,delta] = polyval(c,x,s)

>> [c,s] = polyfit(x,y,2)

>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')

Linear Regression Example:

>> [y2,delta] = polyval(c,x,s)

>> [c,s] = polyfit(x,y,1)

>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')

xi yi

1

2

3

4

5

6

7

0.5

2.5

2.0

4.0

3.5

6.0

5.5

Multiple Linear Regression

y = c0 + c1x1+ c2x2+ . . .+ cpxp

Example case: two independent variables y = c0 + c1x1+ c2x2

2

0 1 1 2 2( )r i i iS y c c x c x= ∑ − − −Sum of squares of the residual:

0 1 1 2 2

0

1 0 1 1 2 2

1

2 0 1 1 2 2

2

2 ( )

2 ( )

2 ( )

ri i i

ri i i i

ri i i i

Sy c c x c x

c

Sx y c c x c x

c

Sx y c c x c x

c

∂= − ∑ − − −

∂= − ∑ − − −

∂= − ∑ − − −

Differentiate with respect to unknowns:

Setting partial derivatives = 0 and expressing result in matrix form:

1 2 0

2

1 1 1 2 1 1

2

2 1 2 2 2 2

i i i

i i i i i i

i i i i i i

n x x c y

x x x x c x y

x x x x c x y

∑ ∑ ∑ ∑ ∑ ∑ = ∑ ∑ ∑ ∑ ∑

Example:

x1 x2 y

0

2

2.5

1

4

7

0

1

2

3

6

2

5

10

9

0

3

27

0

1

2

0

1

2

6 16.5 14 54

16.5 76.25 48 243.5

14 48 54 100

5

4

3

c

c

c

c

c

c

=

=

=

= −

Multivariate Fit in MATLAB

c0 + c1x11+ c2x12+ . . .+ cpx1p = y1

.

.

.

c0 + c1x21+ c2x22+ . . .+ cpx2p = y2

c0 + c1xm1 + c2xm2+ . . .+ cpxmp= ym

11 12 1 0 1

21 22 2 1 2

1 2

1

1, , and y

1

p

p

m m mp p m

x x x c y

x x x c y

x x x c y

= = =

A c

� � � � � � �

Overdetermined system of equations: A c = y

Fit norm >> c = (A’*A)\(A’*y)

Fit QR >> c = A\y

Example:

x1 x2 y

0

2

2.5

1

4

7

0

1

2

3

6

2

5

10

9

0

3

27

>> x1=[0 2 2.5 1 4 7]';

>> x2=[0 1 2 3 6 2]';

>> y=[5 10 9 0 3 27]';

>> A=[x1 x2 ones(size(x1))];

>> c=A\y