SYSTEMS Identification - UM

SYSTEMSSYSTEMSIdentificationIdentification

Ali KarimpourAssistant Professor

Ferdowsi University of Mashhad

Reference: “System Identification Theory For The User” Lennart Ljung

2

lecture 10

Ali Karimpour Nov 2009

Lecture 10

Computing the estimateComputing the estimate

Topics to be covered include:� Linear Regression and Least Squares.

� Numerical Solution by Iterative Search Method.

� Computing Gradients.

� Two-Stage and Multistage Method.

� Local Solutions and Initial Values.

� Subspace Methods for Estimating State Space Models.

3

lecture 10


IntroductionIn chapter 7 three basic parameter estimation method considered

1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ.

2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ.

3- The Subspace Approach to estimating state space models.

In this chapter we shall discuss how these problems are best solved numerically.

4

lecture 10


Linear Regression and Least Squares.







5

lecture 10



θϕθ )()|(ˆ tty T=

For linear regression we have:

Least-squares criterion leads to

∑∑=

−

=⎥⎦

⎤⎢⎣

⎡==

N

t

N

t

TNLSN tyt

Ntt

NZV

1

1

1)()(1)()(1),(minargˆ ϕϕϕθθ

θ

)(NR )(Nf

)()(ˆ 1 NfNRLSN

−=θAn alternative form is:

)(ˆ)( NfNR LSN =θ Normal equations

Note that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IVmethod.

1×p

1×d

pd ×

1×ddd ×

6

lecture 10



)(ˆ)( NfNR LSN =θ Normal equations

R(N) may be ill-conditioned specially when its dimension is high.The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property

)(NRRRT =

This class of methods is commonly known as “square-root algorithm”

But the term “quadratic methods” is more appropriate.

How to derive R?• Householder• Gram-Schmidt procedure• Bjorck decomposition

• Cholesky decomposition• QR decomposition

⎥⎦

⎤⎢⎣

⎡= ∑

=

N

t

T ttN

NR1

)()(1)( ϕϕdd ×

dd ×

7

lecture 10



ngularupper tria,, RIQQQRA T ==

Solving for the LS estimates by QR factorization.The QR-factorization of an n d matrix A is defined as:×

Here Q is an unitary n n and R is n d.× ×

⎥⎦

⎤⎢⎣

⎡−⎥⎦

⎤⎢⎣

⎡−

−−=⎥

⎦

⎤⎢⎣

⎡=

0.6325-04.4272-162.3

3162.09487.09487.03162.0

4321

:1 Example A

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡ −−

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−−

−=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

008281.004374.79161.5

4082.03450.08452.08165.02760.05071.0

4082.08971.01690.0

654321

:2 Example A

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡−−

−−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−

=

00028.10015.016.20

15.189.273.1

63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0

021131

100101

:3 Example A

8

lecture 10




Solving for the LS estimates by QR factorization.

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡−−

−−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−

=

00028.10015.016.20

15.189.273.1

63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0

021131

100101

:3 Example A

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡−−−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

000016.2089.273.1

72.035.015.058.048.023.062.058.044.090.00024.012.077.058.0

21310001

:4 Example A

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−

−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

00073.1

79.021.0058.021.078.000.058.0

001057.058.0058.0

1101

:5 Example A

9

lecture 10




Solving for the LS estimates by QR factorization.

1is,

)(..

)1(

×

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

= NpY

Ny

y

Y

2θΦ−= Y

Let define

Let Q as an unitary matrix, then

( ) 22),( θθθ Φ−=Φ−= YQYZV TNN

∑=

−=N

t

TNN ttyZV

1

2)()(),( θϕθ

dNp

T

T

×Φ

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=Φ is,

(1)..(1)

ϕ

ϕ

10

lecture 10



ngularupper tria,, RIQQQRA T ==Solving for the LS estimates by QR factorization.

( ) 22),( θθθ Φ−=Φ−= YQYZV TNN

Now, introduce QR-factorization

[ ]⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡==Φ

0.....,

0RRQRY

1+d

Np

NpNp× )1( +× dNp)1()1( +×+ dd ⎥

⎦

⎤⎢⎣

⎡=

3

210 0 R

RRR

dd × 1×d

scalar

This means that

( ) 2),( θθ Φ−= YQZV TN

N2

32

12 RRR +−= θ

which clearly is minimized for2

321 ),ˆ( giving,ˆ RZVRR NNNN == θθ

21

3

2

0 ⎥⎦

⎤⎢⎣

⎡−⎥

⎦

⎤⎢⎣

⎡=

θRRR

11

lecture 10


)1()2()1()( 21 −=−+−+ tbutyatyaty

Consider the simple model for system

Exercise: Suppose for t=1 to 11 the value of u and y are:[ ][ ]T

T

y

u

2656.45313.60625.3125.225.65.13441424

21521735232

−−−−=

−=

1) Derive from eq. (I) and find the condition number of R(N)Nθ

)()()(ˆ 1 INfNRN−=θ

)(ˆ2

11 IIRRN−=θ

2) Derive from eq. (II) and find the condition number of R1Nθ

[ ]105.0ˆ :Answer N =θ


12

lecture 10



Solving for the LS estimates by QR factorization.( ) 22),( θθθ Φ−=Φ−= YQYZV N

N

There are three important advantages with this way of solving the LS estimate:

2321 ),ˆ( giving,ˆ RZVRR N

NNN == θθ

2- R1 is a triangular matrix, so the equation is easy to solve.

3- If the QR-factorization is performed for a regressor size d*, then the solutionsfor all models with fewer parameter are easily obtained from R0.

Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0

11)( RRNR TT =ΦΦ=Therefore R1 is much better conditioned than R(N).

1- The condition number of R1 is the square root of R(N).

13

lecture 10



Initial condition: “Windowed” Data

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−

=

)(..

)1(

)(

ntz

tz

tϕ

The regression vector φ(t) is:

Here z(t-1) is an r-dimensional vector. For example, the for ARX model

⎥⎦

⎤⎢⎣

⎡−=

)()(

)(tuty

tz

For example, the for AR model)()( tytz −=

R(N) will be:

∑=

−−=N

t

Tij jtzitz

NNR

1

)()(1)(

14

lecture 10



Initial condition: “Windowed” Data

R(N) will be:

∑=

−−=N

t

Tij jtzitz

NNR

1

)()(1)(

If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition

1 - Start the summation at t=n+1 rather than t=1.

2 - Replace the unknown initial condition by zeros.

15

lecture 10


Numerical Solution by Iterative Search Method







16

lecture 10



Numerical minimizationMethods for numerical minimization of a function V(θ) update the minimizing point iteratively by:

In general neither the function

nor

cannot be minimized or solved by analytical methods.

∑=

=N

t

NN tl

NZV

1)),,((1),( θθεθ ∑

=

==N

t

NN tt

NZf

1)),((),(1),(0 θεαθξθ

)()()1( ˆˆ iii fαθθ +=+

f (i) is a search direction based on information about V(θ)α is a positive constant

Depending on the information to determine f (i) there is 3 groups

1- Methods using function values only.2- Methods using values of the function as well as of its gradient.3- Methods using values of the function, its gradient and of its Hessian..

17

lecture 10



Depending on the information to determine f (i) there is 3 groups

• Methods using function values only.

• Methods using values of the function V as well as of its gradient.

• Methods using values of the function, its gradient and of its Hessian..

Newton algorithms

Quasi Newton algorithms

An estimate of Hessian is find and then:

An estimate of gradient is used then Quasi Newton algorithm applied.

[ ] )ˆ()ˆ( )(1)()( iii VVf θθ ′′′−=−

[ ] )ˆ()ˆ( )(1)()( iii VVf θθ ′′′−=−

18

lecture 10



In general consider the function

∑=

=N

t

NN tl

NZV

1)),,((1),( θθεθ

The gradient is:

∑=

⎟⎠⎞

⎜⎝⎛ ′−′

∂∂

−−=′N

t

NN tltlt

NZV

1)),,(()),,((),(1),( θθεθθε

θθεθ θε

Here, Ψ(t,θ) is:

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=

qd ×

1×d

1×q

( )∑=

′−′−=′N

t

NN tltlt

NZV

1)),,(()),,((),(1),( θθεθθεθψθ θε

19

lecture 10



Some explicit search schemes

∑=

=N

t

NN t

NZV

1

2),(1),( θεθConsider the special case

∑=

−=′N

t

NN tt

NZV

1),(),(1),( θεθψθ

The gradient is:

A general family of search routines is given by

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN

iN ZVR θµθθ ′−=

−+

iterate th denoted ˆ )( iiNθ

directionsearch themodifiesat matrix th a is )( ddR iN ×

),ˆ(),ˆ( )()1( NiN

NiN ZVZV θθ ≤+

atchosen th is and sizee step thedenotes )(iNµ

20

lecture 10




∑=

=N

t

NN t

NZV

1

2),(1),( θεθ

Consider the special case

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

atchosen th is and sizee step thedenotes )(iNµ

),ˆ(),ˆ( )()1( NiN

NiN ZVZV θθ ≤+

21

lecture 10




∑=

=N

t

NN t

NZV

1

2),(1),( θεθ


[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

IR iN =)(Let then we have

),ˆ(ˆˆ )()()()1( NiN

iN

iN

iN ZV θµθθ ′−=+

This is the gradient or steepest-descent method.

This method is fairly inefficient close to the minimum.

22

lecture 10



Gradient or steepest-descent method for solving f(x)=0.

This method is fairly inefficient close to the minimum..

• Make an initial guess: x0.

x0

• Draw the tangent line.

Its equation is:))(()( 000 xxxfxfy −′+=

• Let x1 be x-intercept of the tangent line.

x1

• This intercept is given by the formula:)()(

0

001 xf

xfxx′

−=

• Now repeat x1 as the initial guess.

x2

23

lecture 10



Gradient or steepest-descent method for solving f(x)=0.

)()(

0

001 xf

xfxx′

−=Some difficulties of steepest-descent method.

• Zero derivatives. • Diverging.

x0

x1x2x2

24

lecture 10



Gradient or steepest-descent method for finding minimum of f(x)

25

lecture 10



Gradient or steepest-descent method for finding minimum of f(x)

),ˆ(ˆˆ )()()()1( NiN

iN

iN

iN ZV θµθθ ′−=+

26

lecture 10




∑=

=N

t

NN t

NZV

1

2),(1),( θεθ


[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

The gradient or steepest-descent method is fairly inefficient close to the minimum.

∑=

−=′N

t

NN tt

NZV

1

),(),(1),( θεθψθ

The gradient and the Hessian of V is:

∑∑==

′−=′′N

t

N

t

TNN tt

Ntt

NZV

11

),(),(1),(),(1),( θεθψθψθψθ

),( )()( NiNN

iN ZVR θ″=Let then we have

[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN

NiN

iN

iN

iN ZVZV θθµθθ ′′′−=

−+

This is the Newton method.

But it is not an easy task to compute Hessian since of .),( θψ t′

27

lecture 10




∑=

=N

t

NN t

NZV

1

2),(1),( θεθ


[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN

NiN

iN

iN


−+

This is the Newton method.

But it is not an easy task to compute Hessian since of .),( θψ t′

∑∑==

′−=′′N

t

N

t

TNN tt

Ntt

NZV

11

),(),(1),(),(1),( θεθψθψθψθ

Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t) 0),( =′⇒ θψ t

∑=

≈N

t

T ttN 1

),(),(1 θψθψ )(θNH∆

=

28

lecture 10



Newton method

∑∑==

′−=′′N

t

N

t

TNN tt

Ntt

NZV

11

),(),(1),(),(1),( θεθψθψθψθ ∑=

≈N

t

T ttN 1

),(),(1 θψθψ )(θNH∆

=

∑=

=N

t

NN t

NZV

1

2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

)( )()( iNN

iN HR θ=

So choose of

in the vicinity of minimum is a good estimate of Hessian.

This is known as the Gauss-Newton Method.

In the statistical literature it is called the “Method of scoring”.

In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.

29

lecture 10



Newton method

∑∑==

′−=′′N

t

N

t

TNN tt

Ntt

NZV

11

),(),(1),(),(1),( θεθψθψθψθ ∑=

≈N

t

T ttN 1

),(),(1 θψθψ )(θNH∆

=

∑=

=N

t

NN t

NZV

1

2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

1)( )()()( == iN

iNN

iN HR µθ

and for

the term “damped Guess-Newton” has been used.

Dennis and Schnabel reserve the term “Guess-Newton” for

)()()( adjustable)( iN

iNN

iN HR µθ=

30

lecture 10



Newton method

∑=

==N

t

TiNN

iN tt

NHR

1

)()( ),(),(1)( θψθψθ[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+

Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as “regularization techniques”

Goldfeld, Quandt and Trotter suggest

scalar positive a is where),(),(11

1)(1

)()()( λλθψθψλ

λ ⎥⎦

⎤⎢⎣

⎡+

+= ∑

=N

N

t

iN

TiN

iN Itt

NR

Levenberg and Marquardt suggest

scalar positive a is where),(),(1)(1

)()()( λλθψθψλ N

N

t

iN

TiN

iN Itt

NR += ∑

=

With λ = 0 we have the Guess-Newton case, increasing λ means that the step size is decreased and the search direction is turned towards the gradient.

31

lecture 10



Remember that we want to

or(I) ),(minimize NN ZV θ (II) )),((),(1),(0

1∑=

==N

t

NN tt

NZf θεαθξθ

Newton method to solve (I)

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiN

NiN

iN

iN


−++

This leads to 0),ˆ( )( =′ Ni

N ZV θ

Newton-Raphson method to solve (II)

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN

NiNN

iN

iN

iN ZfZf θθµθθ

−++ ′−=

Correlation EquationSolving equation (II) is quite analogous to the minimization of (I)

Substitution method to solve (II)

),ˆ(ˆˆ )()1()()1( NiNN

iN

iN

iN Zf θµθθ ++ −=

32

lecture 10


Computing Gradients







33

lecture 10


Computing Gradients

The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation.

Example 10.1 Consider the ARMAX model)()()()()()( teqCtuqBtyqA +=

the predictor is: [ ] )()()()()()|(ˆ)( tyqAqCtuqBtyqC −+=θ

Differentiation with respect to ak is:

)()|(ˆ)( tyqtya

qC k

k

−−=∂∂ θ

similarly

)()|(ˆ)( tuqtyb

qC k

k

−=∂∂ θ )()|(ˆ)()|(ˆ tyqty

cqCtyq k

k

k −− =∂∂

+ θθ

now( ))|(ˆ)()|(ˆ)( θθ tytyqty

cqC k

k

−=∂∂ −

34

lecture 10


Computing Gradients

)()|(ˆ)( tuqtyb

qC k

k

−=∂∂ θ

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

),(),()( θϕθψ ttqC =

)()|(ˆ)( tyqtya

qC k

k

−−=∂∂ θ ( ))|(ˆ)()|(ˆ)( θθ tytyqty

cqC k

k

−=∂∂ −

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

=

)|(ˆ...

)|(ˆ

)|(ˆ...

)|(ˆ

)|(ˆ...

)|(ˆ

)(),()(

1

1

1

θ

θ

θ

θ

θ

θ

θψ

tyc

tyc

tyb

tyb

tya

tya

qCtqC

nc

nb

na

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

−

−−

−−−

−−

=

)(...

)1()(

...)1(

)(...

)1(

c

b

a

nt

tntu

tunty

ty

ε

ε

),( θϕ t=

35

lecture 10


Computing Gradients

SISO black box model

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure and its predictor is:

)()(

)()(1)()()()()()|(ˆ ty

qCqAqDtu

qFqCqBqDty ⎥

⎦

⎤⎢⎣

⎡−+=θ

so we have

)()()()|(ˆ kty

qCqDty

ak

−−=∂∂ θ )(

)()()()|(ˆ ktuqFqC

qDtybk

−=∂∂ θ

)()|(ˆ)()|(ˆ tyqtyc

qCtyq k

k

k −− =∂∂

+ θθ )()()()()()(

)()()()()()|(ˆ kty

qCqCqAqDktu

qFqCqCqBqDty

ck

−+−−=∂∂

⇒ θ

)()()()(

)()()()|(ˆ kty

qCqAktu

qFqCqBty

dk

−−−=∂∂ θ ),(

)(1)|(ˆ θθ ktvqC

tydk

−−=∂∂

⇒

)()(

)()(1)|(ˆ)()|(ˆ tyqqC

qAqDtyf

qFtyq k

k

k −−⎥⎦

⎤⎢⎣

⎡−=

∂∂

+ θθ )()()()(

)()()|(ˆ ktuqFqFqC

qBqDtyfk

−−=∂∂

⇒ θ

),()()(

)()|(ˆ θθ ktwqFqC

qDtyfk

−=∂∂

⇒

36

lecture 10


Computing Gradients


)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure and its predictor is:

)()(

)()(1)()()()()()|(ˆ ty

qCqAqDtu

qFqCqBqDty ⎥

⎦

⎤⎢⎣

⎡−+=θ

)()()()|(ˆ kty

qCqDty

ak

−−=∂∂ θ )(

)()()()|(ˆ ktuqFqC

qDtybk

−=∂∂ θ

)()()()()()(

)()()()()()|(ˆ kty

qCqCqAqDktu

qFqCqCqBqDty

ck

−+−−=∂∂ θ

),()(

1)|(ˆ θθ ktvqC

tydk

−−=∂∂

),()()(

)()|(ˆ θθ ktwqFqC

qDtyfk

−=∂∂

1)()()( === qDqCqA

As an special case consider OE model

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

37

lecture 10


Computing Gradients


)()(

1)|(ˆ ktuqF

tybk

−=∂∂ θ ),(

)(1)|(ˆ θθ ktwqF

tyfk

−=∂∂

1)()()( === qDqCqAAs an special case consider OE model

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

),(),()( θϕθψ ttqF =

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

∂∂

∂∂

∂∂

∂∂

=

)|(ˆ...

)|(ˆ

)|(ˆ...

)|(ˆ

)(),()(

1

1

θ

θ

θ

θ

θψ

tyf

tyf

tyb

tyb

qFtqF

nf

nb

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

−

−−

−

=

),(...

),1()(

...)1(

θ

θ

f

b

ntw

twntu

tu

),( θϕ t=

38

lecture 10


Two-Stage and Multistage Method







39

lecture 10




Linear Regression and Least Squares

• Efficient methods with analytic solution.

• Guaranteed convergence to a local minimum.

• Efficiently.

• Applicability to general model structure.Com

bine

d

40

lecture 10



Why we interest in this topic:

• It helps to understand the identification literature.

• It is useful to providing initial estimates to use in iterative methods .

Som

e im

port

ant T

wo-

Stag

e or

M

ultis

tage

Met

hod

1- Bootstrap Methods.

2- Bilinear Parameterization.

3- Separate Least Squares.

4- High Order AR(X) Models.

5- Separating Dynamics And Noise Models.

6- Determining ARMA Models.

7- Subspace Methods For Estimating State Space Models.

41

lecture 10



Bootstrap MethodsConsider the correlation formulation

This formulation contains a number of common situation

[ ]Tba ntutuntxtxqKt )(...)1(),(...),1()(),( −−−−−−= θθθξ

)(),( tt ϕθϕ =

• IV methods with:

• PLR methods:

),(),( θϕθξ tt =

• Minimizing the quadratic criterion:),(),( θψθξ tt =

[ ]⎭⎬⎫

⎩⎨⎧

=−= ∑=

N

t

TPLRN ttyt

Nsol

1

0),()(),(1ˆ θθϕθϕθ

∑=

=N

t

NN t

NZV

1

2),(1),( θεθ ∑=

−=′N

t

NN tt

NZV

1),(),(1),( θεθψθ

( )∑=

−==N

t

TNN ttyt

NZf

1),()(),(1),(0 θθϕθξθ

IV: instrument variable

PLR: Pseudo linear regression

42

lecture 10



Bootstrap Methods( )∑

=

−==N

t

TNN ttyt

NZf

1),()(),(1),(0 θθϕθξθConsider the correlation formulation

( ) 0)ˆ,()()ˆ,(11

)1()1( =−∑=

−−N

t

iN

TiN ttyt

Nθθϕθξ

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡= ∑∑

=

−−

=

−−N

t

iN

N

t

iN

TiN

iN tyt

Ntt

N 1

)1(1

1

)1()1()( )()ˆ,(1)ˆ,()ˆ,(1ˆ θξθϕθξθ

),(,),( θϕθξθ tt T↔It is called Bootstrap Method since it alternate between:

It does not necessarily converge to a solution. A convergence analysis is given by:

43

lecture 10



Bilinear Parameterization.For some models, the predictor is bilinear in the parameters, for example consider ARARX model

)()(

1)()()()( teqD

tuqBtyqA +=

Now the estimator is

( ) )()()(1)()()()|(ˆ tyqDqAtuqDqBty −+=θ

[ ]Tnnn dbadddbbbaaa ......... 212121=θ

Let

[ ]Tηρθ =

Bilinear means that is linear in ρ for fixed η and linear in η for fixed ρ.)|(ˆ θty

44

lecture 10



Bilinear Parameterization.In ARARX model

With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let

),|(ˆ)|(ˆ ηρθ tyty =[ ]Tηρθ =

Exercise 10T.3 Show that this minimization problem is an special case of 10.40.

According to exercise 10T.3 Bilinear parameterization is thus

indeed a descent method. It converges to a local minimum.

45

lecture 10



Separate Least Squares.

),()|(ˆ θϕθθ tty T= ),(),|(ˆ ηϕθηθ tty T=

The identification criterion then becomes

For given η this criterion is an LS criterion and minimized w.r.t. θ by

We can thus insert it to VN and define the problem as

A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor:

46

lecture 10



Separate Least Squares.The identification criterion then becomes

2-

3-

1-

The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions.

47

lecture 10



High Order AR(X) Models.

)()()()()( 000 teqHtuqGty +=

Suppose the true system is:

An order M, ARX structure is used

)()()()()( tetuqBtyqA MM +=

Hannan and Kavalieris and Ljung and Wahlberg show that

So high-order ARX model is capable of approximating any linear system arbitrary well.

48

lecture 10



High Order AR(X) Models.So high-order ARX model is capable of approximating any linear system arbitrary well.It is of course desirable to reduce this high-order to more tractable versions:

49

lecture 10



Separating Dynamics And Noise Models.

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure is:

)(ˆ)(ˆ)(ˆ qFqBqA

)()(ˆ)(ˆ

)()(ˆ)(ˆ tuqFqBtyqAtv −=

)()()()(ˆ te

qDqCtv =

50

lecture 10



Determining ARMA Models.

)()()()(ˆ te

qDqCtv =

51

lecture 10



Subspace Methods For Estimating State Space Models.

The Subspace methods can also be regarded as a two-stage method, being built up from two LS-steps.

52

lecture 10


Local Solutions and Initial Values







53

lecture 10



Local MinimaThe general numerical schemes in Section 10.2 typically have the property that, with suitably chosen step length µ, they will converge to a solution i.e.

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN

NiNN

iN

iN

iN ZfZf θθµθθ

−++ ′−=

),ˆ(ˆˆ )()1()()1( NiNN

iN

iN

iN Zf θµθθ ++ −=

0),( s.t. )( =→ ∗∗ NNNN

iN Zf θθθ

While for positive definite R, we have local minimum of VN(θ,Z)

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN

iN

iN


−+ minimum local a is ),( s.t. )( NNNN

iN ZV ∗∗→ θθθ

The global minimum interests us.

May have several solutions.

To find the global solution one must start at different feasible initial values.

An important possibility is to use some preliminary estimation procedure to produce a good initial value.

Local minima do not necessary create problem in practice, if a model passes the validation tests. (Sec 16.5 and 16.6)

54

lecture 10



Remember fromchapter 8

),( NNN ZV ∗θ

55

lecture 10



Results from SISO Black-box Models

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure is:

Consider the assumption that the system can be described within the model set: SεM

The results are listed below for the general SISO model set and refer to

),(21)( 2 θεθ tEV =

56

lecture 10



Results from SISO Black-box Models

57

lecture 10



Initial parameter valuesDuo to the possible occurrence of undesired local minima in the criterion function, it is worthwhile to put some effort on producing good initial values.Also Newton-type method has good local convergence rate, it is again worthwhile to put some effort on producing good initial values.

1- For a physical parameterized model structure:• Use your physical insight.

2- For a linear black-box model structure: )()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

58

lecture 10



Initial filter condition

In some configuration we need initial values φ(0,θ).

1 - Start the summation at t=n+1 rather than t=1.

2 - Consider initial condition by:

59

lecture 10


Subspace Methods for Estimating State Space Models







60

lecture 10



Let us now consider how to estimate the system matrices A, B, C and D in the ss model

)()()()()()()()1(

tvtDutCxtytwtButAxtx

++=++=+

Let the output y(t) is a p-dimensional column vector, the input u(t) is a m-dimensional column vector. Also the order of system is n.

We also assume that this ss representation is a minimal realization.

We know that many different representation can also described the system. They are:

)(~)()(~)()(~)()(~)1(~ 11

tvtDutxCTtytwtBuTtxATTtx

++=++=+ −−

Where T is any invertible matrix. We also have

)()(~ 1 txTtx −=

61

lecture 10



Let the ss as:)()()()(

)()()()1(tvtDutCxty

twtButAxtx++=++=+

known. are ˆ and ˆ If I) CA

.knownnot are ˆ and ˆBut CA

known. is system for the , matrix,ity observabil (extended) theIf II) rO

. and find easy to isIt DB

. and find easy to isIt CA

.knownnot is But rO

estimated.ly consistent becan matrix ity observabil extended The III)

d.constructe becan state theestimated,been hasmatrix ity observabil theOnce IV)

► Estimating B and D

► Finding A and C from Observability matrix

► Estimating the Extended Observability matrix

► Finding the States and Estimating the noise Statistics.

62

lecture 10



Let the ss as:)()()()(

)()()()1(tvtDutCxty

twtButAxtx++=++=+

DBCA and find Try toknown are ˆ and ˆ Suppose →

► Estimating B and D

► Finding A and C from Observability matrix

CAOr and find Try toknown is Suppose →

► Estimating the Extended Observability matrix

rO find Try to available are outputs andinput Suppose →

► Finding the States and Estimating the noise Statistics.

Subs

pace

pro

cedu

re.

63

lecture 10



DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D

For given and fixed the model structure: ˆ ˆand A C

It is clearly linear in B and D.

If the system operates in open loop. We can thus consistently estimate B and Daccording to theorem 8.4 even if the noise sequence is non-white.

)()()(ˆ)(

)()()(ˆ)1(

tvtDutxCty

twtButxAtx

++=

++=+

64

lecture 10



Let us write the predictor in the standard linear regression form


[ ]

[ ] mm

m

m

mm

m

m

ududtxCu

uddtxCtDutxCty

ububtxAu

ubbtxAtButxAtx

+++=⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=

+++=⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=+

...)(ˆ......)(ˆ)()(ˆ)(

...)(ˆ......)(ˆ)()(ˆ)1(

11

1

1

11

1

1[ ]

[ ]⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=+

m

m

m

m

u

uddtxCtDutxCty

u

ubbtxAtButxAtx

......)(ˆ)()(ˆ)(

......)(ˆ)()(ˆ)1(

1

1

1

1[ ]

[ ]⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡+=+=+

m

m

m

m

u

uddtxCtDutxCty

u

ubbtxAtButxAtx

......)(ˆ)()(ˆ)(

......)(ˆ)()(ˆ)1(

1

1

1

1

( ) ( ) mmmm ududububAsICty +++++−=−

......ˆˆ)(ˆ 1111

1

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

mb

b...

1

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

md

d...

1

mpmn +mpmnp +×

65

lecture 10




( ) ( ))......ˆˆ)(ˆ 1111

1

mmmm ududububAsICty +++++−=−

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

mb

b...

1

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

md

d...

1

mpmn +mpmnp +×

66

lecture 10




If desired, also the initial state x0=x(0) can be estimated in an analogous way, since the predictor with initial values taken into account is

Which is linear also in x0. Here is the unit pulse at time 0. ( )tδ

67

lecture 10


► Finding A and C from Observability matrix CAOr and find Try to Suppose →

Suppose that a dimensional matrix G is given. That is related to the extended observability matrix Or . We have to determine A and C from G.

*pr n×

Known System Order. Suppose first we know that

So that n*=n. To find C is then immediate:

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

−1

...r

r

CA

CAC

O


• Unknown System Order.

• Known System Order.There is two situation:

68

lecture 10




Similarly, we can find from the equationA

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

=

−

−

1

2

...

r

rr

CACA

CAC

O

An ˆin unknown 2 equations )1( −rnp

69

lecture 10




downO upO

downup OAO =ˆ ( ) downT

upupT

up OOOOA1ˆ −

=

Role of the State Space Basis

The extended obsevability matrix is depends on the choice of basis in the state-space representation. It is easy to verify that the observability matrix would be

Note: Large r leads to numerical problem.

70

lecture 10


Unknown system order.

Suppose now the true orders of the system is unknown. And that n*-the number of columns of G is just an upper bound for the order.


71

lecture 10



72

lecture 10



T

G⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−−

−

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−−−−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=41.013.091.0

41.086.031.082.049.030.0

000000081.100039.24

99.000.001.014.011.034.035.087.007.028.085.044.003.090.040.019.0

31119761024421

1U1V

T

G⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−

−

⎥⎦

⎤⎢⎣

⎡

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−

−−−

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=13.091.086.031.0

49.030.0

81.10039.24

01.014.035.087.0

85.044.040.019.0

31119761024421

1S

73

lecture 10



Now multiplying this by V1 from right.

Now multiplying this by S1-1 from right.

Or for some invertible matrix R:

74

lecture 10



Using a Noisy Estimate of the Extended Observability Matrix

Let us now assume that the given matrix G is a noisy estimate of the true obsevability matrix

*pr n×

Due to the noise, S will typically have all singular non-zero values

It is reasonable to proceed as above and perform an SVD on G:

Where EN is small and tends to zero as . The rank of Or is not known. While the noise matrix EN is likely to be full rank.

∞→N

75

lecture 10



The first n will be supported by Or , while the remaining ones will stem from EN .

So this system of equations should be solved in a least-squares sense.

rO ˆrOThen use to determine , as before. However in the noisy case, will

not be exactly subject to the shift structureA C

If the noise is small, one should expected that the latter are significantly smaller than the former.

Therefore determine n as the number of singular values that are significantly larger than 0.

76

lecture 10


Subspace Methods for Estimating State Space Models Using Weighting Matrices in the SVDFor more flexibility we could pre- and post- multiply G as before performing the SVD

1 2G W GW=

Here R is an arbitrary matrix, that will the coordinate basis for the state representation.

In the noiseless case E=0, these weightings are without consequence. However, when noise is present, they have an important influence on the space spanned by U1.and hence on the quality of the estimates and .A C

Remark. The post-multiplying W2 by an orthogonal matrix does not effect the U1-matrix in the decomposition.

And then use the below equation to determine andA C

The post-multiplication by W2 just corresponds to a change of basis in the state-space and the pre-multiplication by W1 is eliminated.

Exercise: Proof the mentioned remark.(10.E10).

77

lecture 10



► Estimating the Extended Observability Matrix.

Remember

Now,

78

lecture 10




Now, form the vectors

The scalar r, is the maximum prediction horizon.

79

lecture 10




And the Kth block component of V(t)

80

lecture 10



► Esimating the Extended Observability Matrix.

?

We must eliminate the U term and make the noise influence disappear asymptotically.

? ?

81

lecture 10




We must eliminate the U term and make the noise influence disappear asymptotically.

Removing the U-term. Form the matrixN N×

Multiplying from the right by will leads to:∏⊥TU

Now

?Since this term is made up of noise contributions, the idea is to correlate is away with a suitable matrix.

82

lecture 10




Removing the Noise Term. Since the last term is made up of noise contributions. The idea is to correlate it away with a suitable matrix. Define matrix .s N× ( )s n≥

G NrTO ~NV

Here acts as an instrument and we must define it such thatΦ

83

lecture 10




GNrTO ~

NVHere acts as an instrument and we must define it such thatΦ

then

The matrix G can thus be seen as a noisy estimate of the extended observabilitymatrix.

pr s×

But we need to define .Φ

so

84

lecture 10



Finding Good Instruments. The only remaining question is how to achieve to the following equations

Remember instrument variable:

Remember:

The law of large numbers states that the sample sums converges to their respective expected values, so

85

lecture 10




Assume the input u is generated in open loop, so that it is independent of the noise V. 0

Now let

The k;th block component of V(t) is:

ttttV

s )()(

ϕ

86

lecture 10




A formal proof that has full rank is not immediate and will involve properties of the input.

T~

Similarly we have:

See problem 10G.6 and Van Overschee and DeMoor(1996).

87

lecture 10


Finding the States and Estimating the Noise statistics


Some part of chapter 7

Let a system given by the impulse response representation

)()()()()(0

jtejhjtujhty ej

u −+−=∑∞

=

Let the formal k-step ahead predictors be defined by just deleting the last terms so:

)()()()()|(ˆ jtejhjtujhktty ekj

u −+−=− ∑∞

=

Define

[ ])(ˆ...)1(ˆˆ

)1|1(ˆ...

)1|(ˆ)(ˆ NYYY

trty

ttytY rrr =

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−+

−=

88

lecture 10




Some part of chapter 7

)()()()()()(0

Ijtejhjtujhty ej

u −+−=∑∞

=


u −+−=− ∑∞

=

[ ])(ˆ...)1(ˆˆ

)1|1(ˆ...

)1|(ˆ)(ˆ NYYY

trty

ttytY rrr =

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−+

−=

Then the following is true as (see chapter 4 appendix A) ∞→N

1- The system (I) has an nth order minimal state space description if and only if the

rank is equal to n for all r ≥ nY

2- The state vector of any minimal realizations form can be chosen as linear

)(ˆ)( tYLtx r=

89

lecture 10




Let a system given by the impulse response representation


u −+−=− ∑∞

=

)()()()()1|1(ˆ1

jtejhjtujhtkty ej

u −+−=−−+ ∑∞

=

)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα

For practical reason we have

This predictor can be determined effectively by

)1()()()1( −+++=−+ kttUtkty lTks

Tk εγϕθ

or, dealing with all r predictors simultaneously

90

lecture 10




According to chapter 7

By LS we have

By inverse lemma

91

lecture 10




According to chapter 7

Predicted output is

RUOr 1=

)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα

So we have

TVSRX 111ˆ −=

92

lecture 10




So let:

TVSRX 111ˆ −=

93

lecture 10


With the states given, we can estimate the process and measurement noises as


94

lecture 10


Putting It All Together

The family of subspace algotithm1. From the input-output data form


Remember:

The scalar r, is the maximal prediction horizon and in many algorithms use r = s

Many algorithms choose φs(t) to consist of past inputs and outputs with s1=s2=s. So scalar s is a design variable.

95

lecture 10



The family of subspace algotithm

2. Select weighting matrices W1 and W2 and perform SVD


The weighting matrices W1 and W2. This is the perhaps most important choice. Existing algorithms employ the following choices:

96

lecture 10





3. Select a full rank matrix R and define the matrix solverp n× 11 1

ˆrO W U R−=

For and . The latter equation should be solved in a least square sense.C A

Typical choices for R, are R=I, R=S1 or 1/21R S=

4. Estimate , and from the linear regression problem:B D 0x

97

lecture 10





5. If a noise model is sought, form as inX

And estimate the noise contributions as in

SYSTEMS Identification - UM

Documents

Transcript of SYSTEMS Identification - UM