New Identification of complex systems · 2019. 9. 30. · Istationary point = ,2! Improvements in...
Transcript of New Identification of complex systems · 2019. 9. 30. · Istationary point = ,2! Improvements in...
Numerically optimal identification of complex systemsTowards instrumental variables and δ-parameterizations
Tom Oomen
joint work with Robbert van Herpen, Robbert Voorhoeve, and Jurgen van Zundert
www.toomen.euDepartment of Mechanical Engineering
Eindhoven University of Technology
September 24, 2019
2/25Identification of complex systems
System identification
dataparameterization
criterion
algorithm−−−−−−→ model
Many examples of increasing complexityI networked systems (this ERNSI)I mechanical systems
I SYSID15 invited session (https://www.kth.se/social/group/system-identificatio/page/17th-ifac-symposium-on-system-identifica/)I many industries (particularly around Eindhoven)
3/25Identification of complex systems
Controller
Traditional motion controlI de Callafon & Van den Hof (2001): 3 motion DOFs⇒ 3 input, 3 outputI van de Wal et al. (2002): 6 motion DOFs⇒ 6 input, 6 output
Increasing complexity: active control of flexible modesI van Herpen et al. (2014): � 6 input, 6 outputI currently (2019): 14 input, 14 output
Algorithms for complex systems:high order, low damping, large dynamic range, many inputs, many outputs?
4/25Identification of complex systems
Typical identification algorithms (SISO)I frequency domain data: Go(ξk), ξk ∈ ξ (s-domain or Z -domain)
I parameterization: G (ξ) =n(ξ,θ)d(ξ,θ)
(nonlinear in θ)
I criterion V =m∑
k=1
∣∣∣wk(Go(ξk) – G (ξk))∣∣∣2 (wk : maximum likelihood, control-relevant, . . . )
Recent developments in identification algorithms
I V =m∑
k=1
∣∣∣���1d(ξk ,θ)
wk [Go(ξk ) 1][d(ξk , θ)n(ξk , θ)
]∣∣∣2 ⇒ Aθ = b Levy (1959)
I V =m∑
k=1
∣∣∣∣ 1d〈i–1〉(ξk ,θ)
wk [Go(ξk ) 1]
[d〈i〉(ξk , θ)
n〈i〉(ξk , θ)
]∣∣∣∣2 ⇒ A〈i–1〉θ〈i〉 = b〈i–1〉 Sanathanan&K. (1963)
‘SK’: stationary point 6= (local) optimum Whitfield (1987)
I ‘Instrumental variable (IV)’: C 〈i–1〉HA〈i–1〉θ〈i〉 = C 〈i–1〉Hb〈i–1〉 Blom & Van den Hof (2010)
stationary point = (local) optimum
5/25Identification of complex systems
My IV implementation on SYSID2015 benchmark (Voorhoeve et al. 2015)
0 50 100 150 20010
4
105
106
107
108
109
Iteration
V
What’s wrong?I algorithm itself?I implementation?
6/25Identification of complex systems
Is there a numerical issue?
1020
10100
10200
0 50 100 150 200
Iteration
κ
Same algorithm, different implementation
0 50 100 150 20010
4
105
106
107
108
109
Iteration
V
Indeed, many ad hoc fixesI QR factorization (Bayard 1994)
I special bases (Bayard 1994)
I frequency/amplitude scaling (Pintelon & Kollár 2005)
I amplitude scaling (Hakvoort & Van den Hof 1994)
I scaled monomials (Voorhoeve et al. 2015)
I orthonormal bases (Ninness et al. 2000) (Ninness & Hjalmarsson 2001)
I discard part of SVD (Wills & Ninness 2008)
I FLBF (Welsh & Rojas 2007) (Gilson et al. 2017)
I rational bases (Gustavsen & Semlyen 1999)
I . . .I . . .I . . .
7/25Identification of complex systems
Φ =
φ0(ξ1) φ1(ξ1) . . . φl–1(ξ1)φ0(ξ2) φ1(ξ2) . . . φl–1(ξ2)
.
.
.
.
.
.
.
.
.φ0(ξm) φ1(ξm) . . . φl–1(ξm)
Deeper into ’SK’ case: A〈i–1〉θ〈i〉 = b〈i–1〉
I V =m∑
k=1
∣∣∣∣ 1d〈i–1〉(ξk ,θ)
wk [Go(ξk ) 1]
[d〈i〉(ξk , θ)
n〈i〉(ξk , θ)
]∣∣∣∣2I general polynomial basis:[
d(ξ, θ)n(ξ, θ)
]=
l∑j=0
φj (ξ)θj ,φj (ξ) ∈ R2×2[ξ] : degree j block-polynomial
θj ∈ R2×1 : coefficient vector (θl constrained)
I traditional: pick some basis (monomial), then put in matrix W1Φ︸ ︷︷ ︸A
θ = W1Φlθl︸ ︷︷ ︸b
Optimal conditioning of Aθ = bI what if we pick 〈φi (ξ),φj (ξ)〉 =
m∑k=1
φj (ξk)HwH1kw1kφi (ξk) = δij
I then normal equation AHAθ = AHb: ΦHWH1 W1Φ︸ ︷︷ ︸
=I , hence κ(AHA)=1!!
θ = ΦHWH1 W1Φlθl
I many results on computing φ(ξ) (Rutishauser 1963) (Reichel et al. 1991) (Bultheel & Van Barel 1995): more today!
8/25Identification of complex systems
Condition number κ: worst-case amplification of errorsPerturb db in system of equations: A (θ + dθ) = (b + db)
True system: Go
Model: G =no
d(ξ,θ)
10−3
10−2
10−1
100
10−5
10−4
10−3
10−2
10−1
100
101
ξ
Po(ξ
)
Monomial basisκ(A) ∼ 4 · 1011
10−3 10−2 10−1 10010−5
10−4
10−3
10−2
10−1
100
101
ξ
P(ξ
,θ)
10−3 10−2 10−1 10010−5
10−4
10−3
10−2
10−1
100
101
ξ
10−1.8 10−1.7100.5
100.6
Optimal basisκ(A) ∼ 1
10−3 10−2 10−1 10010−5
10−4
10−3
10−2
10−1
100
101
ξ
P(ξ
,θ)
10−3 10−2 10−1 10010−5
10−4
10−3
10−2
10−1
100
101
ξ
10−1.8 10−1.7100.5
100.6
Optimal conditioning reduces sensitivity to round-off errors!
9/25Identification of complex systems
Improvement in identification algorithmsI traditional ’SK’ approach: A〈i–1〉θ〈i〉 = b〈i–1〉 stationary point 6= (local) optimum
I improved ‘IV’ : C 〈i–1〉HA〈i–1〉θ〈i〉 = C 〈i–1〉Hb〈i–1〉 stationary point = (local) optimum
How does C 〈i–1〉HA〈i–1〉θ〈i〉 = C 〈i–1〉Hb〈i–1〉 fit with κ = 1?I on level of normal equations ‘SK’ case: θ = (AHA)–1AHb (κ(A)2!)
I optimal conditioning of A matrix alone not sufficient
Today1. improved ‘IV’ algorithm explained and demonstrated
2. improved ‘IV’ algorithm and κ = 1
3. is κ = 1 all there is to numerics? (no: δ-operators!)
Focus on main ideas, for detailed math and algorithms see papers
IV for improved frequency domain identification
IV and κ = 1
Beyond κ = 1
Summary
10/25IV for improved frequency domain identification
Back to classical SK-iterations
I iterate over i : V =m∑
k=1
∣∣∣∣ 1d(ξk ,θ〈i–1〉)
wk [Go(ξk) 1]
[d(ξk , θ
〈i〉)n(ξk , θ
〈i〉)
]∣∣∣∣2I key issue: stationary point 6= (local) optimum ( ∂V
∂θT6= 0) (Whitfield 1987)
I in practice: initial guess for Gauss-Newton (next slide)
Central idea in Blom & Van den Hof (2010)I first compute ∂V
∂θTand set this to zero:
m∑k=1
ζH(ξk , θ)∣∣θ=θ〈i–1〉
1d(ξk ,θ〈i–1〉)
wk
[Go(ξk) 1
] [d(ξk , θ〈i〉)n(ξk , θ〈i〉)
]= 0
with ζ(ξk , θ) = –wk∂G(ξk ,θ)∂θT
I and iterate!: stationary point = (local) optimum
11/25IV for improved frequency domain identification
1D example(van Zundert et al. 2016)
I one free parameterI 2 minima θ∗,1, θ∗,2
I Gauss-Newton, initial θ0I θ0 < 6 · 10–4 ⇒ θ∗,1
I θ0 > 6 · 10–4 ⇒ θ∗,2
I 10 traditional SK iterationsI for any θ0 ⇒ ◦I close to minimum θ∗,1. . .
I 10 improved IV iterationsI for any θ0 ⇒ �I stationary point = θ∗,2!
Improvements inalgorithms!
12/25IV for improved frequency domain identification
Improvements in algorithms
SK
IV
ConditioningI traditional SK
I A〈i–1〉θ〈i〉 = b〈i–1〉
I QR factorization: κ(A)
I improved IVI C 〈i–1〉HA〈i–1〉θ〈i〉 = C 〈i–1〉Hb〈i–1〉
with typically κ(CHA) ≈ κ(A)2
1020
10100
10200
0 50 100 150 200
Iteration
κ
Improvements in algorithms⇒ explosion of numerical conditioning?
IV for improved frequency domain identification
IV and κ = 1
Beyond κ = 1
Summary
13/25IV and κ = 1
innerproduct
indefiniteinner
product
bilinearform
I IV algorithm (CHA)θ = CHb is essentially
ΦHWH2 W1Φ ΦHWH
2 W1Φ︸ ︷︷ ︸we want this = I !
θ = ΦHWH2 W1Φlθl
I recall from Slide 7: if W2 = W1 (‘SK’ case)
〈φi (ξ),φj (ξ)〉 =m∑
k=1
φj (ξk)HwH1kw1kφi (ξk) ⇒ ΦHWH
1 W1Φ = I
I what if for IV case we pick orthonormal polynomials w.r.t.
〈φi (ξ),ψj (ξ)〉 =m∑
k=1
ψj (ξk)HwH2kw1kφi (ξk) so that ΦHWH
2 W1Φ = I ?
7 this is not an inner product! (just a bilinear form)
14/25IV and κ = 1
Bilinear form: 〈φi (ξ),ψj (ξ)〉 =m∑
k=1
ψj (ξk)HwH2kw1kφi (ξk)
Key idea: introduce additional freedom!I pick two distinct bases ψ,φ:
ΨHWH2 W1Φθ = ΨHWH
2 W1Φlθl
I such that these are bi-orthonormal 〈φi (ξ),ψj (ξ)〉 = δij I
I interpretationI oblique projectionI Ψ transforms ’left’ basis⇒ transforms instrumental variables
Key result I (van Herpen et al. 2016)
bi-orthonormal⇒ ΨHWH2 W1Φ = I ⇒ κ(ΨHWH
2 W1Φ) = 1 optimal conditioning!
15/25IV and κ = 1
It becomes even more interesting: no need to form ΨHWH2 W1Φ︸ ︷︷ ︸
=I
θ = ΨHWH2 W1Φlθl
I note that
Ψ =
ψ0(ξ1) ψ1(ξ1) . . . ψl–1(ξ1)ψ0(ξ2) ψ1(ξ2) . . . ψl–1(ξ2)
......
...ψ0(ξm) ψ1(ξm) . . . ψl–1(ξm)
, Φl =
φl (ξ1)φl (ξ2)
...φl (ξm)
, θ =
θ0θ1...
θl–1
Key result II (for each iteration 〈i〉)(van Herpen et al. 2016)
I bi-orthonormal⇒ ΨHWH2 W1Φl = 0⇒ θ = 0
I II.a. Optimal approximant:[d(ξ, θ)n(ξ, θ)
]= φl (ξ)θl
I II.b. All lower order approximants are also obtained (polynomial basis)
16/25IV and κ = 1
Status so farI (identification) problem solved if we can compute bi-orthonormal bases φi (ξ),ψj (ξ)
Computing the polynomial basesI for φi (ξ),ψj (ξ) ∈ R[ξ] these follow from two 3-term recurrence relations
φi (ξ) = 1γi
((ξ – αi )φi–1(ξ) – βi–1φi–2(ξ))
ψi (ξ) = 1βi
((ξ – αj )ψi–1(ξ) – γi–1ψi–2(ξ)
)I recursion coefficients follow from a new ‘chasing-down-the-diagonal’ algorithm
w21 w22 . . . w2mw11 ξ1w12 ξ2
.... . .
w1m ξm
‘similarity’−−−−−−→
β0γ0 α1 β1
γ1 α2. . .
. . .. . . βm–1γm–1 αm
17/25IV and κ = 1
Illustrative example
100
101
102
−40
−30
−20
−10
0
10
20
30
|.|[d
B]
100
101
102
−180
−90
0
90
180
f [Hz]
6(.)
[◦]
Alg. Basis V (θ?)∥∥∥ ∂V (θ)
∂θ
∣∣θ=θ?
∥∥∥2
κ
SK Monomial 30.47937 1.97 · 10–2 8.09 · 102
SK Orthonormal w.r.t. inner product 30.47937 1.97 · 10–2 1.00IV Monomial 30.47901 < 10–13 6.56 · 105
IV Bi-orthonormal w.r.t. bi-linear form 30.47901 0 1
IV for improved frequency domain identification
IV and κ = 1
Beyond κ = 1
Summary
18/25Beyond κ = 1
Summary so farI improvements in iterative identification algorithms: IV-based
Aθ = b ⇒ (CHA)θ = CHb
I new bi-orthonormal approach leads to CHA = I ⇒ κ(CHA) = 1I special case AHA = I : orthonormal (Rutishauser 1963) (Reichel et al. 1991) (Bultheel & Van Barel 1995)
I essentially solves the problem in polynomial domain through bases Φ, ΨI algorithms for measurements on unit disc (Z -domain), real-line, imaginary axis (s-domain)
I Done?
Need to compute (bi-)orthonormal bases Φ(, Ψ) reliablyI back to classical ’SK’ case: focus on a single orthonormal basis Φ
19/25Beyond κ = 1
κ = 1 for fast-sampled discrete time systems? (SYSID2015 benchmark)
102
103
104
105
106
10-4
10-2
100
102
104
106
108
1010
Sampling frequency [Hz]
κ(A
)-1
Loss of orthonormality for increasing sampling frequencyI conditioning of Aθ = b deteriorates . . .I let’s look at this specific case a bit deeper
20/25Beyond κ = 1
fs → ∞
Im
Re1
fs → ∞
Im
Re1
fs → ∞
Im
Re1
fs → ∞
Fast-sampled discrete-time systemsI sampling frequency fs →∞I poles tend to z = 1I unity is disastrous: (1 + ε1) – (1 + ε2) = 0
δ-operator(Middleton & Goodwin 1986)
I δ = fs(z – 1)
Final topics of this talkI using δ-operator in identification (in particular with κ = 1?)I efficient computation of δ-domain orthonormal polynomials
21/25Beyond κ = 1
Back to traditional ‘SK’ case: Aθ = b, with A = WΦ
I Pick Φ as orthonormal polynomials w.r.t. data-dependent discrete inner product
〈φi (ξ),φj (ξ)〉 =m∑
k=1
φj (ξk)HwH1kw1kφi (ξk) ⇒ κ(A) = 1
Computation of φi for arbitrary points ξkI follow from an i -term recurrence relation ξφi =
k∑j=0
φj (ξ)hj ,i
I with the hj ,i ’sw1 ξ1w2 ξ2...
. . .wm ξm
similarity−−−−−→
h0,0 h0,1 h0,2 . . . h0,m
h1,1 h1,2 . . . h1,mh2,2 . . . h2,m. . .
...hm,m
Essentially an inverse eigenvalue problem, computational explosion O(m3)
22/25Beyond κ = 1
w1 ξ1w2 ξ2...
. . .wm ξm
similarity−−−−−→
× × × ×0 · · · · · · ×0 ×00 × × × · · · · · · ×0 ×00 0 × × · · · · · · ×0 ×0...
.... . .
. . .. . .
. . .. . .
......
.... . .
. . .. . . × × ×0
......
. . .. . .
. . . × × ×0 0 · · · · · · · · · 0 × ×
It’s all about recognizing structure O(m3)⇒ O(mn)
I continuous time: tridiagonal (as on Slide 16): obvious 3-term recurrenceI discrete time: Schur coefficients (a bit more complicated 3-term recurrence)I delta domain: ?
23/25Beyond κ = 1
New result (Voorhoeve & Oomen 2019)
For any generalized circle in the complex plane, the Hessenberg matrix H is a(H – 1)-quasiseparable matrix
ImplicationsI O(mn) complexityI unified approach, special cases
I tridiagonal (s-domain)I unitary Hessenberg/Schur (Z -domain)I δ-domain!
I so what is this (H – 1)-quasiseparable matrix?
× � � · · · · · · � �× × � · · · · · · � �0 × × · · · · · · � �...
. . .. . .
. . .. . .
. . ....
.... . .
. . .. . . × � �
.... . .
. . .. . . × × �
0 · · · · · · · · · 0 × ×
24/25Beyond κ = 1
Example fast-sampled discrete time systems (SYSID2015 benchmark)
102
103
104
105
106
10-4
10-2
100
102
104
106
108
1010
Sampling frequency [Hz]
κ(A
)-1
I traditional z-domain Schur-based parameterization: loss of orthonormalityI z-domain via unified (H – 1)-quasiseparable: also improvedI δ-domain perfect conditioning invariant under increasing sampling frequency
IV for improved frequency domain identification
IV and κ = 1
Beyond κ = 1
Summary
25/25Summary
Numerical aspects in identification and controlI lots of issues when implementing algorithms . . .
. . . seldomly mentioned in application papersI increasingly important for increasing complexity
Results on intersection identification,numerical linear algebra, and orthonormal polynomialsI new instrumental variable (IV) algorithms(Blom & Van den Hof 2010)
I interesting results in frequency domain identification and beyond(van Zundert et al. 2016)
I new algorithm at expense of conditioning?I κ = 1 for IV-type problems(van Herpen et al. 2016)
I there’s more to our algorithms than κ = 1I δ-operator from control theory promising(Voorhoeve & Oomen 2019)
25/25References IBayard, D. S. (1994), ‘High-order multivariable transfer function curve fitting: Algorithms, sparse matrix methods and experimental results’, Automatica 30(9), 1439–1444.Blom, R. S. & Van den Hof, P. M. J. (2010), Multivariable frequency domain identification using IV-based linear regression, in ‘Proceedings of the 49th Conference on Decision and Control’,
pp. 1148–1153.Bultheel, A. & Van Barel, M. (1995), ‘Vector orthogonal polynomials and least squares approximation’, SIAM Journal on Matrix Analysis and Application 16(3), 863–885.de Callafon, R. A. & Van den Hof, P. M. J. (2001), ‘Multivariable feedback relevant system identification of a wafer stepper system’, IEEE Transactions on Control Systems Technology
9(2), 381–390.Gilson, M., Welsh, J. S. & Garnier, H. (2017), ‘Frequency localizing basis function-based IV method for wideband system identification’, IEEE Transactions on Control Systems Technology
26(1), 329–335.Gustavsen, B. & Semlyen, A. (1999), ‘Rational approximation of frequency domain responses by vector fitting’, IEEE Transactions on Power Delivery 14(3), 1052–1061.Hakvoort, R. G. & Van den Hof, P. M. J. (1994), ‘Frequency domain curve fitting with maximum amplitude criterion and guaranteed stability’, International Journal of Control 60(5), 809–825.van Herpen, R., Bosgra, O. & Oomen, T. (2016), ‘Bi-orthonormal polynomial basis function framework with applications in system identification’, IEEE Transactions on Automatic Control
61(11), 3285–3300.van Herpen, R., Oomen, T., Kikken, E., van de Wal, M., Aangenent, W. & Steinbuch, M. (2014), Exploiting additional actuators and sensors for nano-positioning robust motion control, in
‘Proceedings of the 2014 American Control Conference’, Portland, Oregon, United States, pp. 984–990.Levy, E. C. (1959), ‘Complex-curve fitting’, IRE Transactions on Automatic Control 4(1), 37–43.Middleton, R. H. & Goodwin, G. C. (1986), ‘Improved finite word length characteristics in digital control using delta operators’, IEEE Transactions on Automatic Control 31(11), 1015–1021.Ninness, B., Gibson, S. & Weller, S. (2000), Practical aspects of using orthonormal system parameterisations in estimation problems, in ‘ 2000 IFAC Symposium on System Identification’,
Santa Barbara, California, United States, pp. 463–468.Ninness, B. & Hjalmarsson, H. (2001), ‘Model structure and numerical properties of normal equations’, IEEE Transactions on Circuits and Systems 48(4), 425–437.Pintelon, R. & Kollár, I. (2005), ‘On the frequency scaling in continuous-time modeling’, IEEE Transactions on Instrumentation and Measurement 54(1), 318–321.Reichel, L., Ammar, G. & Gragg, W. (1991), ‘Discrete least squares approximation by trigonometric polynomials’, Mathematics of Computation 57(195), 273–289.Rutishauser, H. (1963), On Jacobi rotation patterns, in ‘Proceedings of the AMS Symposium in Applied Mathematics’, Vol. 15, pp. 219–239.Sanathanan, C. K. & Koerner, J. (1963), ‘Transfer function synthesis as a ratio of two complex polynomials’, IEEE Transactions on Automatic Control 8(1), 56–58.Voorhoeve, R. & Oomen, T. (2019), ‘Data-dependent orthogonal polynomials on generalized circles: A unified approach applied to δ-domain identification’, In preparation .Voorhoeve, R., van Rietschoten, A., Geerardyn, E. & Oomen, T. (2015), Identification of high-tech motion systems: An active vibration isolation benchmark, in ‘ 17th IFAC Symposium on
System Identification’, Beijing, China, pp. 1250–1255.van de Wal, M., van Baars, G., Sperling, F. & Bosgra, O. (2002), ‘Multivariable H∞/µ feedback control design for high-precision wafer stage motion’, Control Engineering Practice
10(7), 739–755.Welsh, J. S. & Rojas, C. R. (2007), Frequency localising basis functions for wide-band system identification: A condition number bound for output error systems, in ‘Proceedings of the 2007
European Control Conference’, pp. 4618–4624.Whitfield, A. H. (1987), ‘Asymptotic behaviour of transfer function synthesis methods’, International Journal of Control 45(3), 1083–1092.Wills, A. & Ninness, B. (2008), ‘On gradient-based search for multivariable system estimates’, IEEE Transactions on Automatic Control 53, 1.van Zundert, J., Bolder, J. & Oomen, T. (2016), ‘Optimality and flexibility in iterative learning control for varying tasks’, Automatica 67, 295–302.