SYSTEMS Identification - UM
Transcript of SYSTEMS Identification - UM
SYSTEMSSYSTEMSIdentificationIdentification
Ali KarimpourAssistant Professor
Ferdowsi University of Mashhad
Reference: “System Identification Theory For The User” Lennart Ljung
2
lecture 10
Ali Karimpour Nov 2009
Lecture 10
Computing the estimateComputing the estimate
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
3
lecture 10
Ali Karimpour Nov 2009
IntroductionIn chapter 7 three basic parameter estimation method considered
1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ.
2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ.
3- The Subspace Approach to estimating state space models.
In this chapter we shall discuss how these problems are best solved numerically.
4
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
5
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
θϕθ )()|(ˆ tty T=
For linear regression we have:
Least-squares criterion leads to
∑∑=
−
=⎥⎦
⎤⎢⎣
⎡==
N
t
N
t
TNLSN tyt
Ntt
NZV
1
1
1)()(1)()(1),(minargˆ ϕϕϕθθ
θ
)(NR )(Nf
)()(ˆ 1 NfNRLSN
−=θAn alternative form is:
)(ˆ)( NfNR LSN =θ Normal equations
Note that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IVmethod.
1×p
1×d
pd ×
1×ddd ×
6
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
)(ˆ)( NfNR LSN =θ Normal equations
R(N) may be ill-conditioned specially when its dimension is high.The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property
)(NRRRT =
This class of methods is commonly known as “square-root algorithm”
But the term “quadratic methods” is more appropriate.
How to derive R?• Householder• Gram-Schmidt procedure• Bjorck decomposition
• Cholesky decomposition• QR decomposition
⎥⎦
⎤⎢⎣
⎡= ∑
=
N
t
T ttN
NR1
)()(1)( ϕϕdd ×
dd ×
7
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
ngularupper tria,, RIQQQRA T ==
Solving for the LS estimates by QR factorization.The QR-factorization of an n d matrix A is defined as:×
Here Q is an unitary n n and R is n d.× ×
⎥⎦
⎤⎢⎣
⎡−⎥⎦
⎤⎢⎣
⎡−
−−=⎥
⎦
⎤⎢⎣
⎡=
0.6325-04.4272-162.3
3162.09487.09487.03162.0
4321
:1 Example A
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡ −−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−−
−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
008281.004374.79161.5
4082.03450.08452.08165.02760.05071.0
4082.08971.01690.0
654321
:2 Example A
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−−
−−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−
=
00028.10015.016.20
15.189.273.1
63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0
021131
100101
:3 Example A
8
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
ngularupper tria,, RIQQQRA T ==
Solving for the LS estimates by QR factorization.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−−
−−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−
=
00028.10015.016.20
15.189.273.1
63.050.015.058.042.033.062.058.063.078.00021.017.077.058.0
021131
100101
:3 Example A
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−−−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
000016.2089.273.1
72.035.015.058.048.023.062.058.044.090.00024.012.077.058.0
21310001
:4 Example A
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−
−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
00073.1
79.021.0058.021.078.000.058.0
001057.058.0058.0
1101
:5 Example A
9
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
ngularupper tria,, RIQQQRA T ==
Solving for the LS estimates by QR factorization.
1is,
)(..
)1(
×
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
= NpY
Ny
y
Y
2θΦ−= Y
Let define
Let Q as an unitary matrix, then
( ) 22),( θθθ Φ−=Φ−= YQYZV TNN
∑=
−=N
t
TNN ttyZV
1
2)()(),( θϕθ
dNp
T
T
×Φ
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=Φ is,
(1)..(1)
ϕ
ϕ
10
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
ngularupper tria,, RIQQQRA T ==Solving for the LS estimates by QR factorization.
( ) 22),( θθθ Φ−=Φ−= YQYZV TNN
Now, introduce QR-factorization
[ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡==Φ
0.....,
0RRQRY
1+d
Np
NpNp× )1( +× dNp)1()1( +×+ dd ⎥
⎦
⎤⎢⎣
⎡=
3
210 0 R
RRR
dd × 1×d
scalar
This means that
( ) 2),( θθ Φ−= YQZV TN
N2
32
12 RRR +−= θ
which clearly is minimized for2
321 ),ˆ( giving,ˆ RZVRR NNNN == θθ
21
3
2
0 ⎥⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=
θRRR
11
lecture 10
Ali Karimpour Nov 2009
)1()2()1()( 21 −=−+−+ tbutyatyaty
Consider the simple model for system
Exercise: Suppose for t=1 to 11 the value of u and y are:[ ][ ]T
T
y
u
2656.45313.60625.3125.225.65.13441424
21521735232
−−−−=
−=
1) Derive from eq. (I) and find the condition number of R(N)Nθ
)()()(ˆ 1 INfNRN−=θ
)(ˆ2
11 IIRRN−=θ
2) Derive from eq. (II) and find the condition number of R1Nθ
[ ]105.0ˆ :Answer N =θ
Linear Regression and Least Squares.
12
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
Solving for the LS estimates by QR factorization.( ) 22),( θθθ Φ−=Φ−= YQYZV N
N
There are three important advantages with this way of solving the LS estimate:
2321 ),ˆ( giving,ˆ RZVRR N
NNN == θθ
2- R1 is a triangular matrix, so the equation is easy to solve.
3- If the QR-factorization is performed for a regressor size d*, then the solutionsfor all models with fewer parameter are easily obtained from R0.
Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0
11)( RRNR TT =ΦΦ=Therefore R1 is much better conditioned than R(N).
1- The condition number of R1 is the square root of R(N).
13
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
Initial condition: “Windowed” Data
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−
=
)(..
)1(
)(
ntz
tz
tϕ
The regression vector φ(t) is:
Here z(t-1) is an r-dimensional vector. For example, the for ARX model
⎥⎦
⎤⎢⎣
⎡−=
)()(
)(tuty
tz
For example, the for AR model)()( tytz −=
R(N) will be:
∑=
−−=N
t
Tij jtzitz
NNR
1
)()(1)(
14
lecture 10
Ali Karimpour Nov 2009
Linear Regression and Least Squares.
Initial condition: “Windowed” Data
R(N) will be:
∑=
−−=N
t
Tij jtzitz
NNR
1
)()(1)(
If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition
1 - Start the summation at t=n+1 rather than t=1.
2 - Replace the unknown initial condition by zeros.
15
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
16
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Numerical minimizationMethods for numerical minimization of a function V(θ) update the minimizing point iteratively by:
In general neither the function
nor
cannot be minimized or solved by analytical methods.
∑=
=N
t
NN tl
NZV
1)),,((1),( θθεθ ∑
=
==N
t
NN tt
NZf
1)),((),(1),(0 θεαθξθ
)()()1( ˆˆ iii fαθθ +=+
f (i) is a search direction based on information about V(θ)α is a positive constant
Depending on the information to determine f (i) there is 3 groups
1- Methods using function values only.2- Methods using values of the function as well as of its gradient.3- Methods using values of the function, its gradient and of its Hessian..
17
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Depending on the information to determine f (i) there is 3 groups
• Methods using function values only.
• Methods using values of the function V as well as of its gradient.
• Methods using values of the function, its gradient and of its Hessian..
Newton algorithms
Quasi Newton algorithms
An estimate of Hessian is find and then:
An estimate of gradient is used then Quasi Newton algorithm applied.
[ ] )ˆ()ˆ( )(1)()( iii VVf θθ ′′′−=−
[ ] )ˆ()ˆ( )(1)()( iii VVf θθ ′′′−=−
18
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
In general consider the function
∑=
=N
t
NN tl
NZV
1)),,((1),( θθεθ
The gradient is:
∑=
⎟⎠⎞
⎜⎝⎛ ′−′
∂∂
−−=′N
t
NN tltlt
NZV
1)),,(()),,((),(1),( θθεθθε
θθεθ θε
Here, Ψ(t,θ) is:
)|(ˆ),(),( θθ
θεθ
θψ tytt∂∂
=∂∂
−=
qd ×
1×d
1×q
( )∑=
′−′−=′N
t
NN tltlt
NZV
1)),,(()),,((),(1),( θθεθθεθψθ θε
19
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Some explicit search schemes
∑=
=N
t
NN t
NZV
1
2),(1),( θεθConsider the special case
∑=
−=′N
t
NN tt
NZV
1),(),(1),( θεθψθ
The gradient is:
A general family of search routines is given by
[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
iterate th denoted ˆ )( iiNθ
directionsearch themodifiesat matrix th a is )( ddR iN ×
),ˆ(),ˆ( )()1( NiN
NiN ZVZV θθ ≤+
atchosen th is and sizee step thedenotes )(iNµ
20
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Some explicit search schemes
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ
Consider the special case
[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
atchosen th is and sizee step thedenotes )(iNµ
),ˆ(),ˆ( )()1( NiN
NiN ZVZV θθ ≤+
21
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Some explicit search schemes
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ
Consider the special case
[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
IR iN =)(Let then we have
),ˆ(ˆˆ )()()()1( NiN
iN
iN
iN ZV θµθθ ′−=+
This is the gradient or steepest-descent method.
This method is fairly inefficient close to the minimum.
22
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Gradient or steepest-descent method for solving f(x)=0.
This method is fairly inefficient close to the minimum..
• Make an initial guess: x0.
x0
• Draw the tangent line.
Its equation is:))(()( 000 xxxfxfy −′+=
• Let x1 be x-intercept of the tangent line.
x1
• This intercept is given by the formula:)()(
0
001 xf
xfxx′
−=
• Now repeat x1 as the initial guess.
x2
23
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Gradient or steepest-descent method for solving f(x)=0.
)()(
0
001 xf
xfxx′
−=Some difficulties of steepest-descent method.
• Zero derivatives. • Diverging.
x0
x1x2x2
24
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Gradient or steepest-descent method for finding minimum of f(x)
25
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Gradient or steepest-descent method for finding minimum of f(x)
),ˆ(ˆˆ )()()()1( NiN
iN
iN
iN ZV θµθθ ′−=+
26
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Some explicit search schemes
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ
Consider the special case
[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
The gradient or steepest-descent method is fairly inefficient close to the minimum.
∑=
−=′N
t
NN tt
NZV
1
),(),(1),( θεθψθ
The gradient and the Hessian of V is:
∑∑==
′−=′′N
t
N
t
TNN tt
Ntt
NZV
11
),(),(1),(),(1),( θεθψθψθψθ
),( )()( NiNN
iN ZVR θ″=Let then we have
[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN
NiN
iN
iN
iN ZVZV θθµθθ ′′′−=
−+
This is the Newton method.
But it is not an easy task to compute Hessian since of .),( θψ t′
27
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Some explicit search schemes
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ
Consider the special case
[ ] ),ˆ(),ˆ(ˆˆ )(1)()()()1( NiN
NiN
iN
iN
iN ZVZV θθµθθ ′′′−=
−+
This is the Newton method.
But it is not an easy task to compute Hessian since of .),( θψ t′
∑∑==
′−=′′N
t
N
t
TNN tt
Ntt
NZV
11
),(),(1),(),(1),( θεθψθψθψθ
Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t) 0),( =′⇒ θψ t
∑=
≈N
t
T ttN 1
),(),(1 θψθψ )(θNH∆
=
28
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Newton method
∑∑==
′−=′′N
t
N
t
TNN tt
Ntt
NZV
11
),(),(1),(),(1),( θεθψθψθψθ ∑=
≈N
t
T ttN 1
),(),(1 θψθψ )(θNH∆
=
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
)( )()( iNN
iN HR θ=
So choose of
in the vicinity of minimum is a good estimate of Hessian.
This is known as the Gauss-Newton Method.
In the statistical literature it is called the “Method of scoring”.
In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.
29
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Newton method
∑∑==
′−=′′N
t
N
t
TNN tt
Ntt
NZV
11
),(),(1),(),(1),( θεθψθψθψθ ∑=
≈N
t
T ttN 1
),(),(1 θψθψ )(θNH∆
=
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ [ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
1)( )()()( == iN
iNN
iN HR µθ
and for
the term “damped Guess-Newton” has been used.
Dennis and Schnabel reserve the term “Guess-Newton” for
)()()( adjustable)( iN
iNN
iN HR µθ=
30
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Newton method
∑=
==N
t
TiNN
iN tt
NHR
1
)()( ),(),(1)( θψθψθ[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+
Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as “regularization techniques”
Goldfeld, Quandt and Trotter suggest
scalar positive a is where),(),(11
1)(1
)()()( λλθψθψλ
λ ⎥⎦
⎤⎢⎣
⎡+
+= ∑
=N
N
t
iN
TiN
iN Itt
NR
Levenberg and Marquardt suggest
scalar positive a is where),(),(1)(1
)()()( λλθψθψλ N
N
t
iN
TiN
iN Itt
NR += ∑
=
With λ = 0 we have the Guess-Newton case, increasing λ means that the step size is decreased and the search direction is turned towards the gradient.
31
lecture 10
Ali Karimpour Nov 2009
Numerical Solution by Iterative Search Method
Remember that we want to
or(I) ),(minimize NN ZV θ (II) )),((),(1),(0
1∑=
==N
t
NN tt
NZf θεαθξθ
Newton method to solve (I)
[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiN
NiN
iN
iN
iN ZVZV θθµθθ ′′′−=
−++
This leads to 0),ˆ( )( =′ Ni
N ZV θ
Newton-Raphson method to solve (II)
[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN
NiNN
iN
iN
iN ZfZf θθµθθ
−++ ′−=
Correlation EquationSolving equation (II) is quite analogous to the minimization of (I)
Substitution method to solve (II)
),ˆ(ˆˆ )()1()()1( NiNN
iN
iN
iN Zf θµθθ ++ −=
32
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
33
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation.
Example 10.1 Consider the ARMAX model)()()()()()( teqCtuqBtyqA +=
the predictor is: [ ] )()()()()()|(ˆ)( tyqAqCtuqBtyqC −+=θ
Differentiation with respect to ak is:
)()|(ˆ)( tyqtya
qC k
k
−−=∂∂ θ
similarly
)()|(ˆ)( tuqtyb
qC k
k
−=∂∂ θ )()|(ˆ)()|(ˆ tyqty
cqCtyq k
k
k −− =∂∂
+ θθ
now( ))|(ˆ)()|(ˆ)( θθ tytyqty
cqC k
k
−=∂∂ −
34
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
)()|(ˆ)( tuqtyb
qC k
k
−=∂∂ θ
)|(ˆ),(),( θθ
θεθ
θψ tytt∂∂
=∂∂
−=now
),(),()( θϕθψ ttqC =
)()|(ˆ)( tyqtya
qC k
k
−−=∂∂ θ ( ))|(ˆ)()|(ˆ)( θθ tytyqty
cqC k
k
−=∂∂ −
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
∂∂
∂∂
∂∂
∂∂
=
)|(ˆ...
)|(ˆ
)|(ˆ...
)|(ˆ
)|(ˆ...
)|(ˆ
)(),()(
1
1
1
θ
θ
θ
θ
θ
θ
θψ
tyc
tyc
tyb
tyb
tya
tya
qCtqC
nc
nb
na
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−−
−−−
−−
=
)(...
)1()(
...)1(
)(...
)1(
c
b
a
nt
tntu
tunty
ty
ε
ε
),( θϕ t=
35
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
SISO black box model
)()()()(
)()()()( te
qDqCtu
qFqBtyqA +=
General model structure and its predictor is:
)()(
)()(1)()()()()()|(ˆ ty
qCqAqDtu
qFqCqBqDty ⎥
⎦
⎤⎢⎣
⎡−+=θ
so we have
)()()()|(ˆ kty
qCqDty
ak
−−=∂∂ θ )(
)()()()|(ˆ ktuqFqC
qDtybk
−=∂∂ θ
)()|(ˆ)()|(ˆ tyqtyc
qCtyq k
k
k −− =∂∂
+ θθ )()()()()()(
)()()()()()|(ˆ kty
qCqCqAqDktu
qFqCqCqBqDty
ck
−+−−=∂∂
⇒ θ
)()()()(
)()()()|(ˆ kty
qCqAktu
qFqCqBty
dk
−−−=∂∂ θ ),(
)(1)|(ˆ θθ ktvqC
tydk
−−=∂∂
⇒
)()(
)()(1)|(ˆ)()|(ˆ tyqqC
qAqDtyf
qFtyq k
k
k −−⎥⎦
⎤⎢⎣
⎡−=
∂∂
+ θθ )()()()(
)()()|(ˆ ktuqFqFqC
qBqDtyfk
−−=∂∂
⇒ θ
),()()(
)()|(ˆ θθ ktwqFqC
qDtyfk
−=∂∂
⇒
36
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
SISO black box model
)()()()(
)()()()( te
qDqCtu
qFqBtyqA +=
General model structure and its predictor is:
)()(
)()(1)()()()()()|(ˆ ty
qCqAqDtu
qFqCqBqDty ⎥
⎦
⎤⎢⎣
⎡−+=θ
)()()()|(ˆ kty
qCqDty
ak
−−=∂∂ θ )(
)()()()|(ˆ ktuqFqC
qDtybk
−=∂∂ θ
)()()()()()(
)()()()()()|(ˆ kty
qCqCqAqDktu
qFqCqCqBqDty
ck
−+−−=∂∂ θ
),()(
1)|(ˆ θθ ktvqC
tydk
−−=∂∂
),()()(
)()|(ˆ θθ ktwqFqC
qDtyfk
−=∂∂
1)()()( === qDqCqA
As an special case consider OE model
)|(ˆ),(),( θθ
θεθ
θψ tytt∂∂
=∂∂
−=now
37
lecture 10
Ali Karimpour Nov 2009
Computing Gradients
SISO black box model
)()(
1)|(ˆ ktuqF
tybk
−=∂∂ θ ),(
)(1)|(ˆ θθ ktwqF
tyfk
−=∂∂
1)()()( === qDqCqAAs an special case consider OE model
)|(ˆ),(),( θθ
θεθ
θψ tytt∂∂
=∂∂
−=now
),(),()( θϕθψ ttqF =
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
∂∂
∂∂
=
)|(ˆ...
)|(ˆ
)|(ˆ...
)|(ˆ
)(),()(
1
1
θ
θ
θ
θ
θψ
tyf
tyf
tyb
tyb
qFtqF
nf
nb
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−−
−
=
),(...
),1()(
...)1(
θ
θ
f
b
ntw
twntu
tu
),( θϕ t=
38
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
39
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Numerical Solution by Iterative Search Method
Linear Regression and Least Squares
• Efficient methods with analytic solution.
• Guaranteed convergence to a local minimum.
• Efficiently.
• Applicability to general model structure.Com
bine
d
40
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Why we interest in this topic:
• It helps to understand the identification literature.
• It is useful to providing initial estimates to use in iterative methods .
Som
e im
port
ant T
wo-
Stag
e or
M
ultis
tage
Met
hod
1- Bootstrap Methods.
2- Bilinear Parameterization.
3- Separate Least Squares.
4- High Order AR(X) Models.
5- Separating Dynamics And Noise Models.
6- Determining ARMA Models.
7- Subspace Methods For Estimating State Space Models.
41
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Bootstrap MethodsConsider the correlation formulation
This formulation contains a number of common situation
[ ]Tba ntutuntxtxqKt )(...)1(),(...),1()(),( −−−−−−= θθθξ
)(),( tt ϕθϕ =
• IV methods with:
• PLR methods:
),(),( θϕθξ tt =
• Minimizing the quadratic criterion:),(),( θψθξ tt =
[ ]⎭⎬⎫
⎩⎨⎧
=−= ∑=
N
t
TPLRN ttyt
Nsol
1
0),()(),(1ˆ θθϕθϕθ
∑=
=N
t
NN t
NZV
1
2),(1),( θεθ ∑=
−=′N
t
NN tt
NZV
1),(),(1),( θεθψθ
( )∑=
−==N
t
TNN ttyt
NZf
1),()(),(1),(0 θθϕθξθ
IV: instrument variable
PLR: Pseudo linear regression
42
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Bootstrap Methods( )∑
=
−==N
t
TNN ttyt
NZf
1),()(),(1),(0 θθϕθξθConsider the correlation formulation
( ) 0)ˆ,()()ˆ,(11
)1()1( =−∑=
−−N
t
iN
TiN ttyt
Nθθϕθξ
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡= ∑∑
=
−−
=
−−N
t
iN
N
t
iN
TiN
iN tyt
Ntt
N 1
)1(1
1
)1()1()( )()ˆ,(1)ˆ,()ˆ,(1ˆ θξθϕθξθ
),(,),( θϕθξθ tt T↔It is called Bootstrap Method since it alternate between:
It does not necessarily converge to a solution. A convergence analysis is given by:
43
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Bilinear Parameterization.For some models, the predictor is bilinear in the parameters, for example consider ARARX model
)()(
1)()()()( teqD
tuqBtyqA +=
Now the estimator is
( ) )()()(1)()()()|(ˆ tyqDqAtuqDqBty −+=θ
[ ]Tnnn dbadddbbbaaa ......... 212121=θ
Let
[ ]Tηρθ =
Bilinear means that is linear in ρ for fixed η and linear in η for fixed ρ.)|(ˆ θty
44
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Bilinear Parameterization.In ARARX model
With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let
),|(ˆ)|(ˆ ηρθ tyty =[ ]Tηρθ =
Exercise 10T.3 Show that this minimization problem is an special case of 10.40.
According to exercise 10T.3 Bilinear parameterization is thus
indeed a descent method. It converges to a local minimum.
45
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Separate Least Squares.
),()|(ˆ θϕθθ tty T= ),(),|(ˆ ηϕθηθ tty T=
The identification criterion then becomes
For given η this criterion is an LS criterion and minimized w.r.t. θ by
We can thus insert it to VN and define the problem as
A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor:
46
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Separate Least Squares.The identification criterion then becomes
2-
3-
1-
The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions.
47
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
High Order AR(X) Models.
)()()()()( 000 teqHtuqGty +=
Suppose the true system is:
An order M, ARX structure is used
)()()()()( tetuqBtyqA MM +=
Hannan and Kavalieris and Ljung and Wahlberg show that
So high-order ARX model is capable of approximating any linear system arbitrary well.
48
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
High Order AR(X) Models.So high-order ARX model is capable of approximating any linear system arbitrary well.It is of course desirable to reduce this high-order to more tractable versions:
49
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Separating Dynamics And Noise Models.
)()()()(
)()()()( te
qDqCtu
qFqBtyqA +=
General model structure is:
)(ˆ)(ˆ)(ˆ qFqBqA
)()(ˆ)(ˆ
)()(ˆ)(ˆ tuqFqBtyqAtv −=
)()()()(ˆ te
qDqCtv =
50
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Determining ARMA Models.
)()()()(ˆ te
qDqCtv =
51
lecture 10
Ali Karimpour Nov 2009
Two-Stage and Multistage Method
Subspace Methods For Estimating State Space Models.
The Subspace methods can also be regarded as a two-stage method, being built up from two LS-steps.
52
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
53
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Local MinimaThe general numerical schemes in Section 10.2 typically have the property that, with suitably chosen step length µ, they will converge to a solution i.e.
[ ] ),ˆ(),ˆ(ˆˆ )(1)()1()()1( NiNN
NiNN
iN
iN
iN ZfZf θθµθθ
−++ ′−=
),ˆ(ˆˆ )()1()()1( NiNN
iN
iN
iN Zf θµθθ ++ −=
0),( s.t. )( =→ ∗∗ NNNN
iN Zf θθθ
While for positive definite R, we have local minimum of VN(θ,Z)
[ ] ),ˆ(ˆˆ )(1)()()()1( NiN
iN
iN
iN
iN ZVR θµθθ ′−=
−+ minimum local a is ),( s.t. )( NNNN
iN ZV ∗∗→ θθθ
The global minimum interests us.
May have several solutions.
To find the global solution one must start at different feasible initial values.
An important possibility is to use some preliminary estimation procedure to produce a good initial value.
Local minima do not necessary create problem in practice, if a model passes the validation tests. (Sec 16.5 and 16.6)
54
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Remember fromchapter 8
),( NNN ZV ∗θ
55
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Results from SISO Black-box Models
)()()()(
)()()()( te
qDqCtu
qFqBtyqA +=
General model structure is:
Consider the assumption that the system can be described within the model set: SεM
The results are listed below for the general SISO model set and refer to
),(21)( 2 θεθ tEV =
56
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Results from SISO Black-box Models
57
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Initial parameter valuesDuo to the possible occurrence of undesired local minima in the criterion function, it is worthwhile to put some effort on producing good initial values.Also Newton-type method has good local convergence rate, it is again worthwhile to put some effort on producing good initial values.
1- For a physical parameterized model structure:• Use your physical insight.
2- For a linear black-box model structure: )()()()(
)()()()( te
qDqCtu
qFqBtyqA +=
58
lecture 10
Ali Karimpour Nov 2009
Local Solutions and Initial Values
Initial filter condition
In some configuration we need initial values φ(0,θ).
1 - Start the summation at t=n+1 rather than t=1.
2 - Consider initial condition by:
59
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Topics to be covered include:� Linear Regression and Least Squares.
� Numerical Solution by Iterative Search Method.
� Computing Gradients.
� Two-Stage and Multistage Method.
� Local Solutions and Initial Values.
� Subspace Methods for Estimating State Space Models.
60
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Let us now consider how to estimate the system matrices A, B, C and D in the ss model
)()()()()()()()1(
tvtDutCxtytwtButAxtx
++=++=+
Let the output y(t) is a p-dimensional column vector, the input u(t) is a m-dimensional column vector. Also the order of system is n.
We also assume that this ss representation is a minimal realization.
We know that many different representation can also described the system. They are:
)(~)()(~)()(~)()(~)1(~ 11
tvtDutxCTtytwtBuTtxATTtx
++=++=+ −−
Where T is any invertible matrix. We also have
)()(~ 1 txTtx −=
61
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Let the ss as:)()()()(
)()()()1(tvtDutCxty
twtButAxtx++=++=+
known. are ˆ and ˆ If I) CA
.knownnot are ˆ and ˆBut CA
known. is system for the , matrix,ity observabil (extended) theIf II) rO
. and find easy to isIt DB
. and find easy to isIt CA
.knownnot is But rO
estimated.ly consistent becan matrix ity observabil extended The III)
d.constructe becan state theestimated,been hasmatrix ity observabil theOnce IV)
► Estimating B and D
► Finding A and C from Observability matrix
► Estimating the Extended Observability matrix
► Finding the States and Estimating the noise Statistics.
62
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Let the ss as:)()()()(
)()()()1(tvtDutCxty
twtButAxtx++=++=+
DBCA and find Try toknown are ˆ and ˆ Suppose →
► Estimating B and D
► Finding A and C from Observability matrix
CAOr and find Try toknown is Suppose →
► Estimating the Extended Observability matrix
rO find Try to available are outputs andinput Suppose →
► Finding the States and Estimating the noise Statistics.
Subs
pace
pro
cedu
re.
63
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D
For given and fixed the model structure: ˆ ˆand A C
It is clearly linear in B and D.
If the system operates in open loop. We can thus consistently estimate B and Daccording to theorem 8.4 even if the noise sequence is non-white.
)()()(ˆ)(
)()()(ˆ)1(
tvtDutxCty
twtButxAtx
++=
++=+
64
lecture 10
Ali Karimpour Nov 2009
DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D
Let us write the predictor in the standard linear regression form
Subspace Methods for Estimating State Space Models
[ ]
[ ] mm
m
m
mm
m
m
ududtxCu
uddtxCtDutxCty
ububtxAu
ubbtxAtButxAtx
+++=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=
+++=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=+
...)(ˆ......)(ˆ)()(ˆ)(
...)(ˆ......)(ˆ)()(ˆ)1(
11
1
1
11
1
1[ ]
[ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=+
m
m
m
m
u
uddtxCtDutxCty
u
ubbtxAtButxAtx
......)(ˆ)()(ˆ)(
......)(ˆ)()(ˆ)1(
1
1
1
1[ ]
[ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+=+=+
m
m
m
m
u
uddtxCtDutxCty
u
ubbtxAtButxAtx
......)(ˆ)()(ˆ)(
......)(ˆ)()(ˆ)1(
1
1
1
1
( ) ( ) mmmm ududububAsICty +++++−=−
......ˆˆ)(ˆ 1111
1
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
mb
b...
1
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
md
d...
1
mpmn +mpmnp +×
65
lecture 10
Ali Karimpour Nov 2009
DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D
Subspace Methods for Estimating State Space Models
( ) ( ))......ˆˆ)(ˆ 1111
1
mmmm ududububAsICty +++++−=−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
mb
b...
1
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
md
d...
1
mpmn +mpmnp +×
66
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
DBCA and find Try toknown are ˆ and ˆ Suppose →► Estimating B and D
If desired, also the initial state x0=x(0) can be estimated in an analogous way, since the predictor with initial values taken into account is
Which is linear also in x0. Here is the unit pulse at time 0. ( )tδ
67
lecture 10
Ali Karimpour Nov 2009
► Finding A and C from Observability matrix CAOr and find Try to Suppose →
Suppose that a dimensional matrix G is given. That is related to the extended observability matrix Or . We have to determine A and C from G.
*pr n×
Known System Order. Suppose first we know that
So that n*=n. To find C is then immediate:
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
−1
...r
r
CA
CAC
O
Subspace Methods for Estimating State Space Models
• Unknown System Order.
• Known System Order.There is two situation:
68
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Finding A and C from Observability matrix CAOr and find Try to Suppose →
Similarly, we can find from the equationA
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
=
−
−
1
2
...
r
rr
CACA
CAC
O
An ˆin unknown 2 equations )1( −rnp
69
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Finding A and C from Observability matrix CAOr and find Try to Suppose →
downO upO
downup OAO =ˆ ( ) downT
upupT
up OOOOA1ˆ −
=
Role of the State Space Basis
The extended obsevability matrix is depends on the choice of basis in the state-space representation. It is easy to verify that the observability matrix would be
Note: Large r leads to numerical problem.
70
lecture 10
Ali Karimpour Nov 2009
Unknown system order.
Suppose now the true orders of the system is unknown. And that n*-the number of columns of G is just an upper bound for the order.
Subspace Methods for Estimating State Space Models
71
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
72
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
T
G⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−−
−
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−−−−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=41.013.091.0
41.086.031.082.049.030.0
000000081.100039.24
99.000.001.014.011.034.035.087.007.028.085.044.003.090.040.019.0
31119761024421
1U1V
T
G⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−
−
⎥⎦
⎤⎢⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−
−−−
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=13.091.086.031.0
49.030.0
81.10039.24
01.014.035.087.0
85.044.040.019.0
31119761024421
1S
73
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Now multiplying this by V1 from right.
Now multiplying this by S1-1 from right.
Or for some invertible matrix R:
74
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Using a Noisy Estimate of the Extended Observability Matrix
Let us now assume that the given matrix G is a noisy estimate of the true obsevability matrix
*pr n×
Due to the noise, S will typically have all singular non-zero values
It is reasonable to proceed as above and perform an SVD on G:
Where EN is small and tends to zero as . The rank of Or is not known. While the noise matrix EN is likely to be full rank.
∞→N
75
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
The first n will be supported by Or , while the remaining ones will stem from EN .
So this system of equations should be solved in a least-squares sense.
rO ˆrOThen use to determine , as before. However in the noisy case, will
not be exactly subject to the shift structureA C
If the noise is small, one should expected that the latter are significantly smaller than the former.
Therefore determine n as the number of singular values that are significantly larger than 0.
76
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models Using Weighting Matrices in the SVDFor more flexibility we could pre- and post- multiply G as before performing the SVD
1 2G W GW=
Here R is an arbitrary matrix, that will the coordinate basis for the state representation.
In the noiseless case E=0, these weightings are without consequence. However, when noise is present, they have an important influence on the space spanned by U1.and hence on the quality of the estimates and .A C
Remark. The post-multiplying W2 by an orthogonal matrix does not effect the U1-matrix in the decomposition.
And then use the below equation to determine andA C
The post-multiplication by W2 just corresponds to a change of basis in the state-space and the pre-multiplication by W1 is eliminated.
Exercise: Proof the mentioned remark.(10.E10).
77
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
Remember
Now,
78
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
Now, form the vectors
The scalar r, is the maximum prediction horizon.
79
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
And the Kth block component of V(t)
80
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Esimating the Extended Observability Matrix.
?
We must eliminate the U term and make the noise influence disappear asymptotically.
? ?
81
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
We must eliminate the U term and make the noise influence disappear asymptotically.
Removing the U-term. Form the matrixN N×
Multiplying from the right by will leads to:∏⊥TU
Now
?Since this term is made up of noise contributions, the idea is to correlate is away with a suitable matrix.
82
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
Removing the Noise Term. Since the last term is made up of noise contributions. The idea is to correlate it away with a suitable matrix. Define matrix .s N× ( )s n≥
G NrTO ~NV
Here acts as an instrument and we must define it such thatΦ
83
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
► Estimating the Extended Observability Matrix.
GNrTO ~
NVHere acts as an instrument and we must define it such thatΦ
then
The matrix G can thus be seen as a noisy estimate of the extended observabilitymatrix.
pr s×
But we need to define .Φ
so
84
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Finding Good Instruments. The only remaining question is how to achieve to the following equations
Remember instrument variable:
Remember:
The law of large numbers states that the sample sums converges to their respective expected values, so
85
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Finding Good Instruments. The only remaining question is how to achieve to the following equations
Assume the input u is generated in open loop, so that it is independent of the noise V. 0
Now let
The k;th block component of V(t) is:
ttttV
s )()(
ϕ
86
lecture 10
Ali Karimpour Nov 2009
Subspace Methods for Estimating State Space Models
Finding Good Instruments. The only remaining question is how to achieve to the following equations
A formal proof that has full rank is not immediate and will involve properties of the input.
T~
Similarly we have:
See problem 10G.6 and Van Overschee and DeMoor(1996).
87
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
Some part of chapter 7
Let a system given by the impulse response representation
)()()()()(0
jtejhjtujhty ej
u −+−=∑∞
=
Let the formal k-step ahead predictors be defined by just deleting the last terms so:
)()()()()|(ˆ jtejhjtujhktty ekj
u −+−=− ∑∞
=
Define
[ ])(ˆ...)1(ˆˆ
)1|1(ˆ...
)1|(ˆ)(ˆ NYYY
trty
ttytY rrr =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−+
−=
88
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
Some part of chapter 7
)()()()()()(0
Ijtejhjtujhty ej
u −+−=∑∞
=
)()()()()|(ˆ jtejhjtujhktty ekj
u −+−=− ∑∞
=
[ ])(ˆ...)1(ˆˆ
)1|1(ˆ...
)1|(ˆ)(ˆ NYYY
trty
ttytY rrr =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−+
−=
Then the following is true as (see chapter 4 appendix A) ∞→N
1- The system (I) has an nth order minimal state space description if and only if the
rank is equal to n for all r ≥ nY
2- The state vector of any minimal realizations form can be chosen as linear
)(ˆ)( tYLtx r=
89
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
Let a system given by the impulse response representation
)()()()()|(ˆ jtejhjtujhktty ekj
u −+−=− ∑∞
=
)()()()()1|1(ˆ1
jtejhjtujhtkty ej
u −+−=−−+ ∑∞
=
)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα
For practical reason we have
This predictor can be determined effectively by
)1()()()1( −+++=−+ kttUtkty lTks
Tk εγϕθ
or, dealing with all r predictors simultaneously
90
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
According to chapter 7
By LS we have
By inverse lemma
91
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
According to chapter 7
Predicted output is
RUOr 1=
)(...)1()(...)1()1|1(ˆ 22111 1stutustytytkty ss −++−+−++−=−−+ ββαα
So we have
TVSRX 111ˆ −=
92
lecture 10
Ali Karimpour Nov 2009
Finding the States and Estimating the Noise statistics
Subspace Methods for Estimating State Space Models
So let:
TVSRX 111ˆ −=
93
lecture 10
Ali Karimpour Nov 2009
With the states given, we can estimate the process and measurement noises as
Subspace Methods for Estimating State Space Models
94
lecture 10
Ali Karimpour Nov 2009
Putting It All Together
The family of subspace algotithm1. From the input-output data form
Subspace Methods for Estimating State Space Models
Remember:
The scalar r, is the maximal prediction horizon and in many algorithms use r = s
Many algorithms choose φs(t) to consist of past inputs and outputs with s1=s2=s. So scalar s is a design variable.
95
lecture 10
Ali Karimpour Nov 2009
Putting It All Together
The family of subspace algotithm
2. Select weighting matrices W1 and W2 and perform SVD
Subspace Methods for Estimating State Space Models
The weighting matrices W1 and W2. This is the perhaps most important choice. Existing algorithms employ the following choices:
96
lecture 10
Ali Karimpour Nov 2009
Putting It All Together
The family of subspace algotithm
Subspace Methods for Estimating State Space Models
3. Select a full rank matrix R and define the matrix solverp n× 11 1
ˆrO W U R−=
For and . The latter equation should be solved in a least square sense.C A
Typical choices for R, are R=I, R=S1 or 1/21R S=
4. Estimate , and from the linear regression problem:B D 0x
97
lecture 10
Ali Karimpour Nov 2009
Putting It All Together
The family of subspace algotithm
Subspace Methods for Estimating State Space Models
5. If a noise model is sought, form as inX
And estimate the noise contributions as in