SYSTEMS Identification - UM

SYSTEMSSYSTEMSIdentificationIdentification

Ali KarimpourAssistant Professor

Ferdowsi University of Mashhad

Reference: “System Identification Theory For The User” Lennart Ljung

lecture 10

Ali Karimpour Nov 2009

Lecture 10

Computing the estimateComputing the estimate

Topics to be covered include:� Linear Regression and Least Squares.

� Numerical Solution by Iterative Search Method.

� Computing Gradients.

� Two-Stage and Multistage Method.

� Local Solutions and Initial Values.

� Subspace Methods for Estimating State Space Models.

lecture 10

IntroductionIn chapter 7 three basic parameter estimation method considered

1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ.

2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ.

3- The Subspace Approach to estimating state space models.

In this chapter we shall discuss how these problems are best solved numerically.

lecture 10

Linear Regression and Least Squares.

lecture 10

θϕθ )()|(ˆ tty T=

For linear regression we have:

Least-squares criterion leads to

∑∑=

=⎥⎦

⎤⎢⎣

TNLSN tyt

1)()(1)()(1),(minargˆ ϕϕϕθθ

)(NR )(Nf

)()(ˆ 1 NfNRLSN

−=θAn alternative form is:

)(ˆ)( NfNR LSN =θ Normal equations

Note that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IVmethod.

1×ddd ×

lecture 10

)(ˆ)( NfNR LSN =θ Normal equations

R(N) may be ill-conditioned specially when its dimension is high.The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property

)(NRRRT =

This class of methods is commonly known as “square-root algorithm”

But the term “quadratic methods” is more appropriate.

How to derive R?• Householder• Gram-Schmidt procedure• Bjorck decomposition

• Cholesky decomposition• QR decomposition

⎥⎦

⎤⎢⎣

⎡= ∑

)()(1)( ϕϕdd ×

lecture 10

ngularupper tria,, RIQQQRA T ==

Solving for the LS estimates by QR factorization.The QR-factorization of an n d matrix A is defined as:×

Here Q is an unitary n n and R is n d.× ×

⎥⎦

⎤⎢⎣

⎡−⎥⎦

⎤⎢⎣

⎡−

−−=⎥

⎤⎢⎣

0.6325-04.4272-162.3

3162.09487.09487.03162.0

:1 Example A

⎥⎥⎥

⎢⎢⎢

⎡ −−

⎥⎥⎥

⎢⎢⎢

−−−−

⎥⎥⎥

⎢⎢⎢

008281.004374.79161.5

4082.03450.08452.08165.02760.05071.0

4082.08971.01690.0

654321

:2 Example A

⎥⎥⎥⎥

⎢⎢⎢⎢

⎡−−

−−

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−−−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

00028.10015.016.20

15.189.273.1

63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0

021131

100101

:3 Example A

lecture 10

Solving for the LS estimates by QR factorization.

⎥⎥⎥⎥

⎢⎢⎢⎢

⎡−−

−−

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−−−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

00028.10015.016.20

15.189.273.1

63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0

021131

100101

:3 Example A

⎥⎥⎥⎥

⎢⎢⎢⎢

⎡−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−−−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

000016.2089.273.1

72.035.015.058.048.023.062.058.044.090.00024.012.077.058.0

21310001

:4 Example A

⎥⎥⎥⎥

⎢⎢⎢⎢

⎡−

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−

−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

00073.1

79.021.0058.021.078.000.058.0

001057.058.0058.0

:5 Example A

lecture 10

Solving for the LS estimates by QR factorization.

⎥⎥⎥⎥

⎢⎢⎢⎢

2θΦ−= Y

Let define

Let Q as an unitary matrix, then

( ) 22),( θθθ Φ−=Φ−= YQYZV TNN

TNN ttyZV

2)()(),( θϕθ

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=Φ is,

(1)..(1)

lecture 10

ngularupper tria,, RIQQQRA T ==Solving for the LS estimates by QR factorization.

( ) 22),( θθθ Φ−=Φ−= YQYZV TNN

Now, introduce QR-factorization

[ ]⎥⎥⎥

⎢⎢⎢

⎡==Φ

0.....,

0RRQRY

NpNp× )1( +× dNp)1()1( +×+ dd ⎥

⎤⎢⎣

210 0 R

dd × 1×d

scalar

This means that

( ) 2),( θθ Φ−= YQZV TN

12 RRR +−= θ

which clearly is minimized for2

321 ),ˆ( giving,ˆ RZVRR NNNN == θθ

0 ⎥⎦

⎤⎢⎣

⎡−⎥

⎤⎢⎣

lecture 10

)1()2()1()( 21 −=−+−+ tbutyatyaty

Consider the simple model for system

Exercise: Suppose for t=1 to 11 the value of u and y are:[ ][ ]T

2656.45313.60625.3125.225.65.13441424

21521735232

−−−−=

1) Derive from eq. (I) and find the condition number of R(N)Nθ

)()()(ˆ 1 INfNRN−=θ

11 IIRRN−=θ

2) Derive from eq. (II) and find the condition number of R1Nθ

[ ]105.0ˆ :Answer N =θ

lecture 10

Solving for the LS estimates by QR factorization.( ) 22),( θθθ Φ−=Φ−= YQYZV N

There are three important advantages with this way of solving the LS estimate:

2321 ),ˆ( giving,ˆ RZVRR N

NNN == θθ

2- R1 is a triangular matrix, so the equation is easy to solve.

3- If the QR-factorization is performed for a regressor size d*, then the solutionsfor all models with fewer parameter are easily obtained from R0.

Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0

11)( RRNR TT =ΦΦ=Therefore R1 is much better conditioned than R(N).

1- The condition number of R1 is the square root of R(N).

lecture 10

Initial condition: “Windowed” Data

⎥⎥⎥⎥

⎢⎢⎢⎢

The regression vector φ(t) is:

Here z(t-1) is an r-dimensional vector. For example, the for ARX model

⎥⎦

⎤⎢⎣

⎡−=

)(tuty

For example, the for AR model)()( tytz −=

R(N) will be:

−−=N

Tij jtzitz

)()(1)(

lecture 10

Initial condition: “Windowed” Data

R(N) will be:

−−=N

Tij jtzitz

)()(1)(

If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition

1 - Start the summation at t=n+1 rather than t=1.

2 - Replace the unknown initial condition by zeros.

lecture 10

Numerical Solution by Iterative Search Method

lecture 10

Numerical minimizationMethods for numerical minimization of a function V(θ) update the minimizing point iteratively by:

In general neither the function

cannot be minimized or solved by analytical methods.

1)),,((1),( θθεθ ∑

1)),((),(1),(0 θεαθξθ

)()()1( ˆˆ iii fαθθ +=+

f (i) is a search direction based on information about V(θ)α is a positive constant

Depending on the information to determine f (i) there is 3 groups

1- Methods using function values only.2- Methods using values of the function as well as of its gradient.3- Methods using values of the function, its gradient and of its Hessian..

lecture 10

Depending on the information to determine f (i) there is 3 groups

• Methods using function values only.

• Methods using values of the function V as well as of its gradient.

• Methods using values of the function, its gradient and of its Hessian..

Newton algorithms

Quasi Newton algorithms

An estimate of Hessian is find and then:

An estimate of gradient is used then Quasi Newton algorithm applied.

[ ] )ˆ()ˆ( )(1)()( iii VVf θθ ′′′−=−

lecture 10

In general consider the function

1)),,((1),( θθεθ

The gradient is:

⎟⎠⎞

⎜⎝⎛ ′−′

∂∂

−−=′N

NN tltlt

1)),,(()),,((),(1),( θθεθθε

θθεθ θε

Here, Ψ(t,θ) is:

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

( )∑=

′−′−=′N

NN tltlt

1)),,(()),,((),(1),( θθεθθεθψθ θε

lecture 10

Some explicit search schemes

2),(1),( θεθConsider the special case

−=′N

1),(),(1),( θεθψθ

The gradient is:

A general family of search routines is given by

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

iN ZVR θµθθ ′−=

iterate th denoted ˆ )( iiNθ

directionsearch themodifiesat matrix th a is )( ddR iN ×

),ˆ(),ˆ( )()1( NiN

NiN ZVZV θθ ≤+

atchosen th is and sizee step thedenotes )(iNµ

lecture 10

2),(1),( θεθ

Consider the special case

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

atchosen th is and sizee step thedenotes )(iNµ

),ˆ(),ˆ( )()1( NiN

NiN ZVZV θθ ≤+

lecture 10

2),(1),( θεθ

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

IR iN =)(Let then we have

),ˆ(ˆˆ )()()()1( NiN

iN ZV θµθθ ′−=+

This is the gradient or steepest-descent method.

This method is fairly inefficient close to the minimum.

lecture 10

Gradient or steepest-descent method for solving f(x)=0.

This method is fairly inefficient close to the minimum..

• Make an initial guess: x0.

• Draw the tangent line.

Its equation is:))(()( 000 xxxfxfy −′+=

• Let x1 be x-intercept of the tangent line.

• This intercept is given by the formula:)()(

001 xf

xfxx′

• Now repeat x1 as the initial guess.

lecture 10

Gradient or steepest-descent method for solving f(x)=0.

001 xf

xfxx′

−=Some difficulties of steepest-descent method.

• Zero derivatives. • Diverging.

x1x2x2

lecture 10

Gradient or steepest-descent method for finding minimum of f(x)

lecture 10

Gradient or steepest-descent method for finding minimum of f(x)

),ˆ(ˆˆ )()()()1( NiN

iN ZV θµθθ ′−=+

lecture 10

2),(1),( θεθ

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

The gradient or steepest-descent method is fairly inefficient close to the minimum.

−=′N

),(),(1),( θεθψθ

The gradient and the Hessian of V is:

∑∑==

′−=′′N

TNN tt

),(),(1),(),(1),( θεθψθψθψθ

),( )()( NiNN

iN ZVR θ″=Let then we have

[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN

iN ZVZV θθµθθ ′′′−=

This is the Newton method.

But it is not an easy task to compute Hessian since of .),( θψ t′

lecture 10

2),(1),( θεθ

[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN

This is the Newton method.

But it is not an easy task to compute Hessian since of .),( θψ t′

∑∑==

′−=′′N

TNN tt

),(),(1),(),(1),( θεθψθψθψθ

Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t) 0),( =′⇒ θψ t

T ttN 1

),(),(1 θψθψ )(θNH∆

lecture 10

Newton method

∑∑==

′−=′′N

TNN tt

),(),(1),(),(1),( θεθψθψθψθ ∑=

T ttN 1

),(),(1 θψθψ )(θNH∆

2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN

)( )()( iNN

iN HR θ=

So choose of

in the vicinity of minimum is a good estimate of Hessian.

This is known as the Gauss-Newton Method.

In the statistical literature it is called the “Method of scoring”.

In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.

lecture 10

Newton method

∑∑==

′−=′′N

TNN tt

),(),(1),(),(1),( θεθψθψθψθ ∑=

T ttN 1

),(),(1 θψθψ )(θNH∆

2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN

1)( )()()( == iN

iN HR µθ

and for

the term “damped Guess-Newton” has been used.

Dennis and Schnabel reserve the term “Guess-Newton” for

)()()( adjustable)( iN

iN HR µθ=

lecture 10

Newton method

)()( ),(),(1)( θψθψθ[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as “regularization techniques”

Goldfeld, Quandt and Trotter suggest

scalar positive a is where),(),(11

)()()( λλθψθψλ

λ ⎥⎦

⎤⎢⎣

+= ∑

iN Itt

Levenberg and Marquardt suggest

scalar positive a is where),(),(1)(1

)()()( λλθψθψλ N

iN Itt

NR += ∑

With λ = 0 we have the Guess-Newton case, increasing λ means that the step size is decreased and the search direction is turned towards the gradient.

lecture 10

Remember that we want to

or(I) ),(minimize NN ZV θ (II) )),((),(1),(0

NZf θεαθξθ

Newton method to solve (I)

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiN

This leads to 0),ˆ( )( =′ Ni

N ZV θ

Newton-Raphson method to solve (II)

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN

iN ZfZf θθµθθ

−++ ′−=

Correlation EquationSolving equation (II) is quite analogous to the minimization of (I)

Substitution method to solve (II)

),ˆ(ˆˆ )()1()()1( NiNN

iN Zf θµθθ ++ −=

lecture 10

Computing Gradients

lecture 10

Computing Gradients

The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation.

Example 10.1 Consider the ARMAX model)()()()()()( teqCtuqBtyqA +=

the predictor is: [ ] )()()()()()|(ˆ)( tyqAqCtuqBtyqC −+=θ

Differentiation with respect to ak is:

)()|(ˆ)( tyqtya

−−=∂∂ θ

similarly

)()|(ˆ)( tuqtyb

−=∂∂ θ )()|(ˆ)()|(ˆ tyqty

cqCtyq k

k −− =∂∂

+ θθ

now( ))|(ˆ)()|(ˆ)( θθ tytyqty

−=∂∂ −

lecture 10

Computing Gradients

)()|(ˆ)( tuqtyb

−=∂∂ θ

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

),(),()( θϕθψ ttqC =

)()|(ˆ)( tyqtya

−−=∂∂ θ ( ))|(ˆ)()|(ˆ)( θθ tytyqty

−=∂∂ −

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

∂∂

)|(ˆ...

)(),()(

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

−−

−−−

−−

...)1(

),( θϕ t=

lecture 10

Computing Gradients

SISO black box model

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure and its predictor is:

)()(1)()()()()()|(ˆ ty

qCqAqDtu

qFqCqBqDty ⎥

⎤⎢⎣

⎡−+=θ

so we have

)()()()|(ˆ kty

qCqDty

−−=∂∂ θ )(

)()()()|(ˆ ktuqFqC

qDtybk

−=∂∂ θ

)()|(ˆ)()|(ˆ tyqtyc

qCtyq k

k −− =∂∂

+ θθ )()()()()()(

)()()()()()|(ˆ kty

qCqCqAqDktu

qFqCqCqBqDty

−+−−=∂∂

⇒ θ

)()()()(

)()()()|(ˆ kty

qCqAktu

qFqCqBty

−−−=∂∂ θ ),(

)(1)|(ˆ θθ ktvqC

−−=∂∂

)()(1)|(ˆ)()|(ˆ tyqqC

qAqDtyf

qFtyq k

k −−⎥⎦

⎤⎢⎣

⎡−=

∂∂

+ θθ )()()()(

)()()|(ˆ ktuqFqFqC

qBqDtyfk

−−=∂∂

⇒ θ

),()()(

)()|(ˆ θθ ktwqFqC

qDtyfk

−=∂∂

lecture 10

Computing Gradients

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure and its predictor is:

)()(1)()()()()()|(ˆ ty

qCqAqDtu

qFqCqBqDty ⎥

⎤⎢⎣

⎡−+=θ

)()()()|(ˆ kty

qCqDty

−−=∂∂ θ )(

)()()()|(ˆ ktuqFqC

qDtybk

−=∂∂ θ

)()()()()()(

)()()()()()|(ˆ kty

qCqCqAqDktu

qFqCqCqBqDty

−+−−=∂∂ θ

1)|(ˆ θθ ktvqC

−−=∂∂

),()()(

)()|(ˆ θθ ktwqFqC

qDtyfk

−=∂∂

1)()()( === qDqCqA

As an special case consider OE model

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

lecture 10

Computing Gradients

1)|(ˆ ktuqF

−=∂∂ θ ),(

)(1)|(ˆ θθ ktwqF

−=∂∂

1)()()( === qDqCqAAs an special case consider OE model

)|(ˆ),(),( θθ

θεθ

θψ tytt∂∂

=∂∂

−=now

),(),()( θϕθψ ttqF =

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

∂∂

)|(ˆ...

)(),()(

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

−−

),(...

),1()(

...)1(

),( θϕ t=

lecture 10

Two-Stage and Multistage Method

lecture 10

Linear Regression and Least Squares

• Efficient methods with analytic solution.

• Guaranteed convergence to a local minimum.

• Efficiently.

• Applicability to general model structure.Com

lecture 10

Why we interest in this topic:

• It helps to understand the identification literature.

• It is useful to providing initial estimates to use in iterative methods .

1- Bootstrap Methods.

2- Bilinear Parameterization.

3- Separate Least Squares.

4- High Order AR(X) Models.

5- Separating Dynamics And Noise Models.

6- Determining ARMA Models.

7- Subspace Methods For Estimating State Space Models.

lecture 10

Bootstrap MethodsConsider the correlation formulation

This formulation contains a number of common situation

[ ]Tba ntutuntxtxqKt )(...)1(),(...),1()(),( −−−−−−= θθθξ

)(),( tt ϕθϕ =

• IV methods with:

• PLR methods:

),(),( θϕθξ tt =

• Minimizing the quadratic criterion:),(),( θψθξ tt =

[ ]⎭⎬⎫

⎩⎨⎧

=−= ∑=

TPLRN ttyt

0),()(),(1ˆ θθϕθϕθ

2),(1),( θεθ ∑=

−=′N

1),(),(1),( θεθψθ

( )∑=

−==N

TNN ttyt

1),()(),(1),(0 θθϕθξθ

IV: instrument variable

PLR: Pseudo linear regression

lecture 10

Bootstrap Methods( )∑

−==N

TNN ttyt

1),()(),(1),(0 θθϕθξθConsider the correlation formulation

( ) 0)ˆ,()()ˆ,(11

)1()1( =−∑=

−−N

TiN ttyt

Nθθϕθξ

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡= ∑∑

−−

−−N

iN tyt

)1()1()( )()ˆ,(1)ˆ,()ˆ,(1ˆ θξθϕθξθ

),(,),( θϕθξθ tt T↔It is called Bootstrap Method since it alternate between:

It does not necessarily converge to a solution. A convergence analysis is given by:

lecture 10

Bilinear Parameterization.For some models, the predictor is bilinear in the parameters, for example consider ARARX model

1)()()()( teqD

tuqBtyqA +=

Now the estimator is

( ) )()()(1)()()()|(ˆ tyqDqAtuqDqBty −+=θ

[ ]Tnnn dbadddbbbaaa ......... 212121=θ

[ ]Tηρθ =

Bilinear means that is linear in ρ for fixed η and linear in η for fixed ρ.)|(ˆ θty

lecture 10

Bilinear Parameterization.In ARARX model

With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let

),|(ˆ)|(ˆ ηρθ tyty =[ ]Tηρθ =

Exercise 10T.3 Show that this minimization problem is an special case of 10.40.

According to exercise 10T.3 Bilinear parameterization is thus

indeed a descent method. It converges to a local minimum.

lecture 10

Separate Least Squares.

),()|(ˆ θϕθθ tty T= ),(),|(ˆ ηϕθηθ tty T=

The identification criterion then becomes

For given η this criterion is an LS criterion and minimized w.r.t. θ by

We can thus insert it to VN and define the problem as

A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor:

lecture 10

Separate Least Squares.The identification criterion then becomes

The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions.

lecture 10

High Order AR(X) Models.

)()()()()( 000 teqHtuqGty +=

Suppose the true system is:

An order M, ARX structure is used

)()()()()( tetuqBtyqA MM +=

Hannan and Kavalieris and Ljung and Wahlberg show that

So high-order ARX model is capable of approximating any linear system arbitrary well.

lecture 10

High Order AR(X) Models.So high-order ARX model is capable of approximating any linear system arbitrary well.It is of course desirable to reduce this high-order to more tractable versions:

lecture 10

Separating Dynamics And Noise Models.

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure is:

)(ˆ)(ˆ)(ˆ qFqBqA

)()(ˆ)(ˆ

)()(ˆ)(ˆ tuqFqBtyqAtv −=

)()()()(ˆ te

qDqCtv =

lecture 10

Determining ARMA Models.

)()()()(ˆ te

qDqCtv =

lecture 10

Subspace Methods For Estimating State Space Models.

The Subspace methods can also be regarded as a two-stage method, being built up from two LS-steps.

lecture 10

Local Solutions and Initial Values

lecture 10

Local MinimaThe general numerical schemes in Section 10.2 typically have the property that, with suitably chosen step length µ, they will converge to a solution i.e.

[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN

iN ZfZf θθµθθ

−++ ′−=

),ˆ(ˆˆ )()1()()1( NiNN

iN Zf θµθθ ++ −=

0),( s.t. )( =→ ∗∗ NNNN

iN Zf θθθ

While for positive definite R, we have local minimum of VN(θ,Z)

[ ] ),ˆ(ˆˆ )(1)()()()1( NiN

−+ minimum local a is ),( s.t. )( NNNN

iN ZV ∗∗→ θθθ

The global minimum interests us.

May have several solutions.

To find the global solution one must start at different feasible initial values.

An important possibility is to use some preliminary estimation procedure to produce a good initial value.

Local minima do not necessary create problem in practice, if a model passes the validation tests. (Sec 16.5 and 16.6)

lecture 10

Remember fromchapter 8

),( NNN ZV ∗θ

lecture 10

Results from SISO Black-box Models

)()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

General model structure is:

Consider the assumption that the system can be described within the model set: SεM

The results are listed below for the general SISO model set and refer to

),(21)( 2 θεθ tEV =

lecture 10

Results from SISO Black-box Models

lecture 10

Initial parameter valuesDuo to the possible occurrence of undesired local minima in the criterion function, it is worthwhile to put some effort on producing good initial values.Also Newton-type method has good local convergence rate, it is again worthwhile to put some effort on producing good initial values.

1- For a physical parameterized model structure:• Use your physical insight.

2- For a linear black-box model structure: )()()()(

)()()()( te

qDqCtu

qFqBtyqA +=

lecture 10

Initial filter condition

In some configuration we need initial values φ(0,θ).

1 - Start the summation at t=n+1 rather than t=1.

2 - Consider initial condition by:

lecture 10

Subspace Methods for Estimating State Space Models

lecture 10

Let us now consider how to estimate the system matrices A, B, C and D in the ss model

)()()()()()()()1(

tvtDutCxtytwtButAxtx

++=++=+

Let the output y(t) is a p-dimensional column vector, the input u(t) is a m-dimensional column vector. Also the order of system is n.

We also assume that this ss representation is a minimal realization.

We know that many different representation can also described the system. They are:

)(~)()(~)()(~)()(~)1(~ 11

tvtDutxCTtytwtBuTtxATTtx

++=++=+ −−

Where T is any invertible matrix. We also have

)()(~ 1 txTtx −=

lecture 10

Let the ss as:)()()()(

)()()()1(tvtDutCxty

twtButAxtx++=++=+

known. are ˆ and ˆ If I) CA

.knownnot are ˆ and ˆBut CA

known. is system for the , matrix,ity observabil (extended) theIf II) rO

. and find easy to isIt DB

. and find easy to isIt CA

.knownnot is But rO

estimated.ly consistent becan matrix ity observabil extended The III)

d.constructe becan state theestimated,been hasmatrix ity observabil theOnce IV)

► Estimating B and D

► Finding A and C from Observability matrix

► Estimating the Extended Observability matrix

► Finding the States and Estimating the noise Statistics.

lecture 10

Let the ss as:)()()()(

)()()()1(tvtDutCxty

twtButAxtx++=++=+

DBCA and find Try toknown are ˆ and ˆ Suppose →

► Estimating B and D

► Finding A and C from Observability matrix

CAOr and find Try toknown is Suppose →

► Estimating the Extended Observability matrix

rO find Try to available are outputs andinput Suppose →

► Finding the States and Estimating the noise Statistics.

lecture 10

DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D

For given and fixed the model structure: ˆ ˆand A C

It is clearly linear in B and D.

If the system operates in open loop. We can thus consistently estimate B and Daccording to theorem 8.4 even if the noise sequence is non-white.

)()()(ˆ)(

)()()(ˆ)1(

tvtDutxCty

twtButxAtx

lecture 10

Let us write the predictor in the standard linear regression form

[ ] mm

ududtxCu

uddtxCtDutxCty

ububtxAu

ubbtxAtButxAtx

+++=⎥⎥⎥

⎢⎢⎢

⎡+=+=

+++=⎥⎥⎥

⎢⎢⎢

⎡+=+=+

...)(ˆ......)(ˆ)()(ˆ)(

...)(ˆ......)(ˆ)()(ˆ)1(

[ ]⎥⎥⎥

⎢⎢⎢

⎡+=+=

⎥⎥⎥

⎢⎢⎢

⎡+=+=+

uddtxCtDutxCty

ubbtxAtButxAtx

......)(ˆ)()(ˆ)(

......)(ˆ)()(ˆ)1(

[ ]⎥⎥⎥

⎢⎢⎢

⎡+=+=

⎥⎥⎥

⎢⎢⎢

⎡+=+=+

uddtxCtDutxCty

ubbtxAtButxAtx

......)(ˆ)()(ˆ)(

......)(ˆ)()(ˆ)1(

( ) ( ) mmmm ududububAsICty +++++−=−

......ˆˆ)(ˆ 1111

⎥⎥⎥

⎢⎢⎢

⎥⎥⎥

⎢⎢⎢

mpmn +mpmnp +×

lecture 10

( ) ( ))......ˆˆ)(ˆ 1111

mmmm ududububAsICty +++++−=−

⎥⎥⎥

⎢⎢⎢

⎥⎥⎥

⎢⎢⎢

mpmn +mpmnp +×

lecture 10

If desired, also the initial state x0=x(0) can be estimated in an analogous way, since the predictor with initial values taken into account is

Which is linear also in x0. Here is the unit pulse at time 0. ( )tδ

lecture 10

► Finding A and C from Observability matrix CAOr and find Try to Suppose →

Suppose that a dimensional matrix G is given. That is related to the extended observability matrix Or . We have to determine A and C from G.

*pr n×

Known System Order. Suppose first we know that

So that n*=n. To find C is then immediate:

⎥⎥⎥⎥

⎢⎢⎢⎢

• Unknown System Order.

• Known System Order.There is two situation:

lecture 10

Similarly, we can find from the equationA

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

An ˆin unknown 2 equations )1( −rnp

lecture 10

downO upO

downup OAO =ˆ ( ) downT

up OOOOA1ˆ −

Role of the State Space Basis

The extended obsevability matrix is depends on the choice of basis in the state-space representation. It is easy to verify that the observability matrix would be

Note: Large r leads to numerical problem.

lecture 10

Unknown system order.

Suppose now the true orders of the system is unknown. And that n*-the number of columns of G is just an upper bound for the order.

lecture 10

G⎥⎥⎥

⎢⎢⎢

−−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−−−−−−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

=41.013.091.0

41.086.031.082.049.030.0

000000081.100039.24

99.000.001.014.011.034.035.087.007.028.085.044.003.090.040.019.0

31119761024421

G⎥⎥⎥

⎢⎢⎢

−−−

⎥⎦

⎤⎢⎣

⎥⎥⎥⎥

⎢⎢⎢⎢

−−−

⎥⎥⎥⎥

⎢⎢⎢⎢

=13.091.086.031.0

49.030.0

81.10039.24

01.014.035.087.0

85.044.040.019.0

31119761024421

lecture 10

Now multiplying this by V1 from right.

Now multiplying this by S1-1 from right.

Or for some invertible matrix R:

lecture 10

Using a Noisy Estimate of the Extended Observability Matrix

Let us now assume that the given matrix G is a noisy estimate of the true obsevability matrix

*pr n×

Due to the noise, S will typically have all singular non-zero values

It is reasonable to proceed as above and perform an SVD on G:

Where EN is small and tends to zero as . The rank of Or is not known. While the noise matrix EN is likely to be full rank.

∞→N

lecture 10

The first n will be supported by Or , while the remaining ones will stem from EN .

So this system of equations should be solved in a least-squares sense.

rO ˆrOThen use to determine , as before. However in the noisy case, will

not be exactly subject to the shift structureA C

If the noise is small, one should expected that the latter are significantly smaller than the former.

Therefore determine n as the number of singular values that are significantly larger than 0.

lecture 10

Subspace Methods for Estimating State Space Models Using Weighting Matrices in the SVDFor more flexibility we could pre- and post- multiply G as before performing the SVD

1 2G W GW=

Here R is an arbitrary matrix, that will the coordinate basis for the state representation.

In the noiseless case E=0, these weightings are without consequence. However, when noise is present, they have an important influence on the space spanned by U1.and hence on the quality of the estimates and .A C

Remark. The post-multiplying W2 by an orthogonal matrix does not effect the U1-matrix in the decomposition.

And then use the below equation to determine andA C

The post-multiplication by W2 just corresponds to a change of basis in the state-space and the pre-multiplication by W1 is eliminated.

Exercise: Proof the mentioned remark.(10.E10).

lecture 10

► Estimating the Extended Observability Matrix.

Remember

lecture 10

Now, form the vectors

The scalar r, is the maximum prediction horizon.

lecture 10

And the Kth block component of V(t)

lecture 10

► Esimating the Extended Observability Matrix.

We must eliminate the U term and make the noise influence disappear asymptotically.

lecture 10

We must eliminate the U term and make the noise influence disappear asymptotically.

Removing the U-term. Form the matrixN N×

Multiplying from the right by will leads to:∏⊥TU

?Since this term is made up of noise contributions, the idea is to correlate is away with a suitable matrix.

lecture 10

Removing the Noise Term. Since the last term is made up of noise contributions. The idea is to correlate it away with a suitable matrix. Define matrix .s N× ( )s n≥

G NrTO ~NV

Here acts as an instrument and we must define it such thatΦ

lecture 10

GNrTO ~

NVHere acts as an instrument and we must define it such thatΦ

The matrix G can thus be seen as a noisy estimate of the extended observabilitymatrix.

pr s×

But we need to define .Φ

lecture 10

Finding Good Instruments. The only remaining question is how to achieve to the following equations

Remember instrument variable:

Remember:

The law of large numbers states that the sample sums converges to their respective expected values, so

lecture 10

Assume the input u is generated in open loop, so that it is independent of the noise V. 0

Now let

The k;th block component of V(t) is:

s )()(

lecture 10

A formal proof that has full rank is not immediate and will involve properties of the input.

Similarly we have:

See problem 10G.6 and Van Overschee and DeMoor(1996).

lecture 10

Finding the States and Estimating the Noise statistics

Some part of chapter 7

Let a system given by the impulse response representation

)()()()()(0

jtejhjtujhty ej

u −+−=∑∞

Let the formal k-step ahead predictors be defined by just deleting the last terms so:

)()()()()|(ˆ jtejhjtujhktty ekj

u −+−=− ∑∞

Define

[ ])(ˆ...)1(ˆˆ

)1|1(ˆ...

)1|(ˆ)(ˆ NYYY

ttytY rrr =

⎥⎥⎥

⎢⎢⎢

−−+

lecture 10

Some part of chapter 7

)()()()()()(0

Ijtejhjtujhty ej

u −+−=∑∞

u −+−=− ∑∞

[ ])(ˆ...)1(ˆˆ

)1|1(ˆ...

)1|(ˆ)(ˆ NYYY

ttytY rrr =

⎥⎥⎥

⎢⎢⎢

−−+

Then the following is true as (see chapter 4 appendix A) ∞→N

1- The system (I) has an nth order minimal state space description if and only if the

rank is equal to n for all r ≥ nY

2- The state vector of any minimal realizations form can be chosen as linear

)(ˆ)( tYLtx r=

lecture 10

Let a system given by the impulse response representation

u −+−=− ∑∞

)()()()()1|1(ˆ1

jtejhjtujhtkty ej

u −+−=−−+ ∑∞

)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα

For practical reason we have

This predictor can be determined effectively by

)1()()()1( −+++=−+ kttUtkty lTks

Tk εγϕθ

or, dealing with all r predictors simultaneously

lecture 10

According to chapter 7

By LS we have

By inverse lemma

lecture 10

According to chapter 7

Predicted output is

RUOr 1=

)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα

So we have

TVSRX 111ˆ −=

lecture 10

So let:

TVSRX 111ˆ −=

lecture 10

With the states given, we can estimate the process and measurement noises as

lecture 10

Putting It All Together

The family of subspace algotithm1. From the input-output data form

Remember:

The scalar r, is the maximal prediction horizon and in many algorithms use r = s

Many algorithms choose φs(t) to consist of past inputs and outputs with s1=s2=s. So scalar s is a design variable.

lecture 10

The family of subspace algotithm

2. Select weighting matrices W1 and W2 and perform SVD

The weighting matrices W1 and W2. This is the perhaps most important choice. Existing algorithms employ the following choices:

lecture 10

3. Select a full rank matrix R and define the matrix solverp n× 11 1

ˆrO W U R−=

For and . The latter equation should be solved in a least square sense.C A

Typical choices for R, are R=I, R=S1 or 1/21R S=

4. Estimate , and from the linear regression problem:B D 0x

lecture 10

5. If a noise model is sought, form as inX

And estimate the noise contributions as in

SYSTEMS Identification - UM

Documents

Transcript of SYSTEMS Identification - UM

Identification of a Novel α-herpesvirus Associated with ...

Identification of Estrogen Receptor α Products via In ...

Identification and Characterization of Reconstituted ...minisites.cambridgecore.org/MAM2017/7337/1312.pdf · Identification and Characterization of Reconstituted α-Synuclein, Amyloid-β

The Black Yeasts: an Update on Species Identification and ...

Identification of new glutamate decarboxylases from ...

Phenotipic identification of neurological malformations: Neuroradiology of Syndromes

Identification and Structure-Activity Studies of 1,3 ...

OMICs approaches-assisted identification of macrophages ......RESEARCH Open Access OMICs approaches-assisted identification of macrophages-derived MIP-1γ as the therapeutic target

Identification of Discharge Coefficients of Orifice …cdn.intechopen.com/pdfs/15608/InTech-Identification_of_discharge...Identification of Discharge Coefficients of Orifice-Type Restrictors

Tissue Distribution and Identification of ... - ASPET Journals

Um epitáfio

Air5450gr Um

Identification of metallo-b-lactamase from a clinical ...

UM SISTEMA CAÓTICO - ime.unicamp.brsandra/MS612/handouts/Projeto2Problema4.pdf · Histórico Equações de Lorenz Conclusão BibliograﬁaI E. Ott. Chaos in Dynamical Systems. 2

Identification and mRNA expression of two 17β ...

University of Dundee Proteomic identification of the UDP ...

Interferon Lambda and Its Receptor: Identification of ...

Microwave-Assisted Extraction and Identification of γ ...

ECE 636: Systems identification - UCY · 2011-10-01 · • Important distributions: chi‐square, t distribution, F distribution • SliSampling di ib idistributions • Sample mean

Identification of Phenolic Compounds in Colored Rice and ...