Fast Harmonic Polynomial Transforms

74
Fast Harmonic Polynomial Transforms Richard Mika¨ el Slevinsky Department of Mathematics, University of Manitoba [email protected] https://home.cc.umanitoba.ca/slevinrm/ Computational Methods in Analytics, Inference, and Computation in Cosmology Insitut Henri Poincar´ e November 29, 2018

Transcript of Fast Harmonic Polynomial Transforms

Page 1: Fast Harmonic Polynomial Transforms

Fast Harmonic Polynomial Transforms

Richard Mikael Slevinsky

Department of Mathematics,University of Manitoba

[email protected]://home.cc.umanitoba.ca/∼slevinrm/

Computational Methodsin Analytics, Inference, and Computation in

CosmologyInsitut Henri Poincare

November 29, 2018

Page 2: Fast Harmonic Polynomial Transforms

Spherical Harmonics

Let µ be a positive Borel measure on D ⊂ Rn. The inner product:

〈f , g〉 =

∫D

f (x)g(x)dµ(x),

induces the norm ‖f ‖2 =√〈f , f 〉 and the Hilbert space L2(D, dµ(x)).

Let S2 ⊂ R3 denote the unit 2-sphere and let dΩ = sin θ dθ dϕ.Then any function f ∈ L2(S2, dΩ) may be expanded in spherical harmonics:

f (θ, ϕ) =+∞∑`=0

+∑m=−`

f m` Ym` (θ, ϕ) =

+∞∑m=−∞

+∞∑`=|m|

f m` Ym` (θ, ϕ),

where the expansion coefficients are:

f m` =〈Ym` , f 〉

〈Ym` ,Y

m` 〉

2 of 33

Page 3: Fast Harmonic Polynomial Transforms

Spherical Harmonics

For ` ∈ N0 and |m| ≤ `, orthonormal spherical harmonics are defined by:

Ym` (x) = Ym

` (θ, ϕ) =eimϕ√

2π(−1)|m|

√(`+ 1

2 )(`− |m|)!

(`+ |m|)!P|m|` (cos θ)︸ ︷︷ ︸

P|m|` (cos θ)

.

Associated Legendre functions are defined by ultraspherical polynomials:

Pm` (cos θ) = (−2)m( 1

2 )m sinm θC(m+ 1

2 )

`−m (cos θ).

The notation Pm` is used to denote orthonormality, and:

(x)n =Γ(x + n)

Γ(x)

is the Pochhammer symbol for the rising factorial.2 of 33

Page 4: Fast Harmonic Polynomial Transforms

Spherical Harmonics

Consider the Laplace–Beltrami operator on S2:

∆θ,ϕ =1

sin θ

∂θ

(sin θ

∂θ

)+

1

sin2 θ

∂2

∂ϕ2.

For ` ∈ N0 and |m| ≤ `, the surface spherical harmonics are theeigenfunctions of ∆θ,ϕ:

∆θ,ϕYm` (θ, ϕ) = −`(`+ 1)Ym

` (θ, ϕ).

2 of 33

Page 5: Fast Harmonic Polynomial Transforms

Spherical Harmonics

Spherical harmonics satisfy many three-term recurrence relations including:

cos θYm` =

√(`−m + 1)(`+ m + 1)

(2`+ 1)(2`+ 3)Ym`+1 +

√(`−m)(`+ m)

(2`− 1)(2`+ 1)Ym`−1,

sin θeiϕYm` =

√(`+ m + 1)(`+ m + 2)

(2`+ 1)(2`+ 3)Ym+1`+1 −

√(`−m − 1)(`−m)

(2`− 1)(2`+ 1)Ym+1`−1 .

They also have an addition theorem:

P`(x · y) =4π

2`+ 1

+∑m=−`

Ym` (x)Ym

` (y).

Perhaps they have too many properties.

2 of 33

Page 6: Fast Harmonic Polynomial Transforms

Synthesis and Analysis

A band-limited function of degree-n on the sphere has no nonzero sphericalharmonic expansion coefficient of degree-n or greater:

fn−1(θ, ϕ) =n−1∑`=0

+∑m=−`

f m` Ym` (θ, ϕ).

For a spherical harmonic Ym` , ` is the degree and m is the order.

The transforms of synthesis and analysis convert between representations ofa band-limited function in momentum and physical spaces:

Synthesis Sample a band-limited function at a set of points on S2.

Analysis Convert samples on S2 to spherical harmonic expansioncoefficients.

Normally, an appropriate set of points is chosen to be able to perfectlyreconstruct a band-limited function of degree-n.

3 of 33

Page 7: Fast Harmonic Polynomial Transforms

Synthesis and Analysis

Point sets include:

Equiangular θk and ϕj are equispaced and their Cartesian product istaken. These include the celebrated [Driscoll and Healy Jr.,1994] (2n)(2n − 1) and [McEwen and Wiaux, 2011](n − 1)(2n − 1) + 1 sampling theorems.

Gaussian cos θk are Gauss–Legendre points and ϕj are equispaced andtheir Cartesian product is taken.

HEALPix procedure to generate points that correspond to a hierarchicalequal area isolatitude pixelization of S2 [Gorski et al., 2005].

Random with common distributions (e.g. uniformly distributed on S2).

Data-driven may be treated similar to random.

The naıve cost of synthesis and analysis is O(n4) but it can be triviallyreorganized to O(n3) for isolatitude point sets.

3 of 33

Page 8: Fast Harmonic Polynomial Transforms

The Connection Problem

Complementary to synthesis and analysis is the connection problem: theexact expansion of spherical harmonics in another basis. The sphere isdoubly periodic and supports a bivariate Fourier series.

The (N)FFT solves synthesis and analysis with bivariate Fourier series.

Ym` (θ, ϕ) =

eimϕ√2π

Pm` (cos θ) ⇒ in longitude, we are done.

The problem is to convert Pm` (cos θ) to Fourier series.

Since Pm` (cos θ) ∝ sinm θC

(m+ 12 )

`−m (cos θ), the intuition is that

even-ordered Pm` are trigonometric polynomials in cos θ; and,

odd-ordered Pm` are trigonometric polynomials in sin θ.

Conversions are not one-to-one. The connection problem is an analogueof the McEwen–Wiaux sampling theorem.

4 of 33

Page 9: Fast Harmonic Polynomial Transforms

Connection vs. Synth. & Anal.

102 103 104

n + 1

10 15

10 14

10 13

10 12

10 11

10 10

Rela

tive

Erro

r

2 SHTns SHTns

2 FT FT

5 of 33

Page 10: Fast Harmonic Polynomial Transforms

Fast Transforms

Fast transforms should:

have an O(n2 logO(1) n) run-time with a similar pre-computation forsynthesis and analysis or the connection problem;

be (backward) stable; and,

be fast in practice.

Algorithms may be classified:

as either exact in exact arithmetic and unstable in floating-pointarithmetic, and numerically stable algorithms in fixed precision that areapproximate to some arbitrarily small tolerance ε;

as either acceleration of synthesis and analysis (>10), or acceleration ofthe connection problem (3); or,

by analytical apparatus: split-Legendre functions, WKB approximation,the fast multipole method (FMM), and the butterfly algorithm.

6 of 33

Page 11: Fast Harmonic Polynomial Transforms

The Transformers

Driscoll Healy Mohlenkamp Suda Takami Potts

Steidl Tasche Kunis Rokhlin Tygert Slevinsky

7 of 33

Page 12: Fast Harmonic Polynomial Transforms

Split-Legendre Functions

The Driscoll–Healy O(n2 log2 n) transform is created by:

proving the first asymptotically optimal sampling theorem using(2n)(2n − 1) equiangular samples. This allows inner products to berepresented as discrete sums; if:

Zk,` = 〈f ,TkP`〉,

then they convert Zk,0 = 〈f ,Tk〉, obtained by the discrete cosinetransform (DCT), to Z0,` = 〈f ,P`〉;using a technology now known as split-Legendre functions, a factorizationof orthogonal polynomial sums evaluated at equiangular points;

performing a sequence of masked and subsampled discrete convolutions;and,

diagonalizing convolutions by the DCT, and fusing different levels in thescheme analytically.

8 of 33

Page 13: Fast Harmonic Polynomial Transforms

Split-Legendre Functions

Driscoll and Healy also provide a rigorous stability analysis, but the boundsare:

polynomial in the degree,

but exponential in the order, or O(`m),

which explains why the original scheme, even though of foundationalimportance, is effectively useless.Algorithms that are exact in exact arithmetic tend to perform poorly in finiteprecision arithmetic. After the Driscoll–Healy paper, there is a divergence inthe literature, where [Potts et al., 1998, Kunis and Potts, 2003, Suda andTakami, 2002, Healy Jr. et al., 2003] proposed ad hoc remedies to stabilizethe original scheme, and others developed approximate algorithms thatare stable in finite precision arithmetic. The subsequent algorithms areconsidered the modern fast transforms.

8 of 33

Page 14: Fast Harmonic Polynomial Transforms

WKB Asymptotics

As Sturm–Liouville eigenfunctions, it is well-known that associated Legendrefunctions of high degree and order have a large oscillatory interior as asubset of θ ∈ [0, π].

9 of 33

Page 15: Fast Harmonic Polynomial Transforms

WKB Asymptotics

In [Mohlenkamp, 1999], a quasi-classical WKB approximation yields:

√sin θPm

` (cos θ) ≈ exp

i∫ θ

√(`+ 1

2 )2 −m2 − 1

4

sin tdt

.Rigorous improvements can be added to the dominant approximation above,leading to two algorithms for synthesis and analysis with O(n

52 log n) and

O(n2 log n) run-times, respectively. Unfortunately, the numerical evidencedoes not substantiate the latter compression algorithm.

9 of 33

Page 16: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 17: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 18: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 19: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 20: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 21: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 22: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 23: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

10 of 33

Page 24: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

The numerical method of [Greengard and Rokhlin, 1987] originates from themultipole expansion of the Coulombic potential:

1

|r − r0|=

1√r2 − 2rr0 cos θ + r2

0

=1

r

∞∑n=0

( r0r

)nPn(cos θ).

Expansion for sufficiently small r0/r 1,

⇔ O(log(ε−1)) terms in the multipole expansion forapproximation to precision ε,

⇔ Subblocks well-separated from the main diagonal.

The FMM enables fast approximate matrix-vector products with Cauchymatrices, generated by sampling 1/(x − y) at all pairwise products of x ∈ Rm

and y ∈ Rn, and other similar kernels, e.g. [Alpert and Rokhlin, 1991].What does well-separation resemble?

10 of 33

Page 25: Fast Harmonic Polynomial Transforms

The Fast Multipole Method

A Cauchy matrix with low-rank subblocks well-separated from the maindiagonal:

10 of 33

Page 26: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms

Associated Legendre functions satisfy:

− d

dx

[(1− x2)

d

dxPm` (x)

]+

m2

1− x2Pm` (x) = `(`+ 1)Pm

` (x).

The key observation of [Rokhlin and Tygert, 2006] is that the differentialequations are structurally similar for |m| > 0.For m odd, they expand Pm

` (x) in the basis of P1` (x), and the differential

part of the operator reduces to a diagonal scaling, and multiplication by(1− x2) is a symmetric pentadiagonal matrix with zeros on the sub- andsuper-diagonal, resulting in a symmetric semiseparable inverse with achessboard pattern of zeros. Formally:

(D + (m2 − 1)M−1)u = λu.

11 of 33

Page 27: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms

The Ritz–Galerkin discretization:

(D + (m2 − 1)M−1)u = λu.

Facts:

The entries of D are `(`+ 1) for ` ≥ |m|;The entries of

[M−1]`,n =

(n + 3

2 )

√(`+1)(`+ 3

2 )(`+2)

(n+1)(n+ 32 )(n+2)

, for ` ≤ n, `+ n even,

(`+ 32 )

√(n+1)(n+ 3

2 )(n+2)

(`+1)(`+ 32 )(`+2)

, for ` > n, `+ n even,

0, otherwise.

The Pm` are the smoothest eigenfunctions and have the smallest

eigenvalues.

11 of 33

Page 28: Fast Harmonic Polynomial Transforms

Diagonal-plus-Semiseparable D & C

For full analysis, see [Chandrasekaran and Gu, 2004]. Let d , u, v ∈ Rn and:

A = D + S = diag(d) + triu(uv>) + tril(vu>).

If:

d =

(d1

d2

), u =

(u1

u2

), and v =

(v1

v2

),

then:

A =

(A1

A2

)+ ww>,

where w =

(u1

v2

), and:

A1 = diag(d1)− diag(u1)2 + triu(u1(v1 − u1)>) + tril((v1 − u1)u>1 ),

A2 = diag(d2)− diag(v2)2 + triu(u2(v2 − u2)>) + tril((v2 − u2)u>2 ).

12 of 33

Page 29: Fast Harmonic Polynomial Transforms

Diagonal-plus-Semiseparable D & C

Let A1 = Q1Λ1Q>1 and A2 = Q2Λ2Q

>2 . Then:

A =

(A1

A2

)+ ww>,

A =

(Q1Λ1Q

>1

Q2Λ2Q>2

)+ ww>,

=

(Q1

Q2

)[(Λ1

Λ2

)+

(Q1

Q2

)>ww>

(Q1

Q2

)](Q1

Q2

)>,

=

(Q1

Q2

)[∆ + zz>

](Q1

Q2

)>,

where ∆ = diag(Λ1,Λ2) and z =

(Q1

Q2

)>w . The conquer step relates

the two subproblems to the larger one via a symmetricdiagonal-plus-rank-one eigendecomposition that can be accelerated by theFMM [Gu and Eisenstat, 1995].

12 of 33

Page 30: Fast Harmonic Polynomial Transforms

Diagonal-plus-Semiseparable D & C

Lemma (Gu and Eisenstat [1995])

Assume that δ1 < δ2 < · · · < δn and that zj > 0. Then the eigenvaluesλini=1 of ∆ + zz> interlace the diagonal entries:

δ1 < λ1 < δ2 < λ2 < · · · < δn < λn,

and are the roots of the secular equation:

f (λ) = 1 +n∑

j=1

z2j

δj − λ= 0.

For each eigenvalue λi , the corresponding eigenvector is:

qi =

(z1

δ1 − λi, · · · , zn

δn − λi

)>/√√√√ n∑j=1

z2j

(δj − λi )2.

12 of 33

Page 31: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms, II

Associated Legendre functions have the symmetric Jacobi matrix:

J =

0 βm

1

βm1 0 βm

2

. . .. . .

. . .

βmn−2 0 βm

n−1

βmn−1 0

,

where:

βm` =

√`(`+ 2m)

(2`+ 2m − 1)(2`+ 2m + 1).

If J = QΛQ>, the eigenvalues of J are the roots of the Pmn+m(x) and the

orthonormal eigenvectors are proportional to the associated Legendrefunctions evaluate at these roots.Thus, the eigenvectors Q implement synthesis and their transposeimplement analysis at the roots of Pm

n+m(x).13 of 33

Page 32: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms, II

But how is synthesis and analysis at a distinct point sets for every orderrelated to a global spherical synthesis and analysis?

According to [Tygert, 2008], the key to using the Jacobi matrix ispost-processing by the Christoffel–Darboux formula, or equivalently thesecond barycentric formula.

The discrete eigenfunctions of the Jacobi matrix are Pm`+m(xk), where xk

are the corresponding Gauss–Jacobi quadrature nodes.

By the barycentric formula and the connection between Gaussian andbarycentric weights:

pn(x) =n∑

k=0

λk fkx − xk

/n∑

k=0

λkx − xk

,

full spherical harmonic synthesis and analysis may be performed at acommon Cartesian product point set, accelerated by the FMM.

13 of 33

Page 33: Fast Harmonic Polynomial Transforms

Tridiagonal D & C

Let the symmetric tridiagonal matrix T be partitioned as:

T =

T1 aa> c b>

b T2

,

Then if T1 = Q1Λ1Q>1 and T2 = Q2Λ2Q

>2 , we have the similarity

transformation to a symmetric arrowhead matrix:Q1

1Q2

>T1 aa> c b>

b T2

Q1

1Q2

=

Λ1 Q>1 aΛ2 Q>2 b

a>Q1 b>Q2 c

.

The symmetric arrowhead spectral decomposition can also be accelerated bythe FMM because the eigenvectors are a (normalized) Cauchy matrix of thearrowhead data. This is analyzed by [Gu and Eisenstat, 1994].

14 of 33

Page 34: Fast Harmonic Polynomial Transforms

Tridiagonal D & C

Lemma (Gu and Eisenstat [1994])

Assume that a1 < a2 < · · · < an−1 and that bj > 0. Then the eigenvalues

λini=1 of

(diag(a) bb> c

), interlace the diagonal entries:

λ1 < a1 < λ2 < a2 < · · · < an−1 < λn,

and are the roots of the secular equation:

f (λ) = λ− c +n−1∑i=1

b2i

ai − λ= 0.

For each eigenvalue λi , the corresponding eigenvector is:

qi =

(b1

λi − a1, · · · , bn−1

λi − an−1, 1

)>/√√√√1 +n−1∑j=1

b2j

(λi − aj)2.

14 of 33

Page 35: Fast Harmonic Polynomial Transforms

Summary

Eigenfunction transforms:

X Accelerate synthesis and analysis or the connection problemto O(n2 log n);

X Require O(n2 log n) pre-computation, but absurdly large inpractice;

∼ Are stable, but the error is proportional to the 2-norms of theSturm–Liouville operators, which we know scale as O(n2);

? Are fast in practice; and,

× Have low memory footprint.

15 of 33

Page 36: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Unsatisfied with FMM-accelerated eigentransforms, [Tygert, 2010] developsnew accelerated synthesis and analysis based on the butterfly algorithm,originating in [Michielssen and Boag, 1996] and studied as an analyticalapparatus [O’Neil et al., 2010].

Purpose: Abstract the algebra of the FFT.

Technique: Divide-and-conquer ⇔ merge-and-split.

Technology: The interpolative decomposition [Liberty et al., 2007].

Proof: Fourier integral operators have rank-proportional-to-area.

The ranks of operator compositions are bounded by the smallest rank in thecomposition, extending applicability beyond Fourier integral operators.

16 of 33

Page 37: Fast Harmonic Polynomial Transforms

The Interpolative Decomposition

LemmaLet A ∈ Rm×n. For any k , there exist ACS ∈ Rm×k whose columns are aunique subset of the columns of A and AI ∈ Rk×n such that:

1 some subset of the columns of AI makes up the k × k identity matrix;

2 ‖vec(AI)‖∞ ≤ 1;

3 the spectral norm of AI satisfies ‖AI‖2 ≤√k(n − k) + 1;

4 the least singular value of AI is at least 1;

5 ACSAI = A whenever k = m or k = n; and,

6 when k < minm, n, the spectral norm of A− ACSAI satisfies:

‖A− ACSAI‖2 ≤√k(n − k) + 1σk+1.

where σk+1 is the k + 1st singular value of A.

We say that A ≈ ACSAI and any structure in A is also in ACS.17 of 33

Page 38: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 1: Partition A ∈ Rn×n into thin strips. Compute IDs of each subblock.

18 of 33

Page 39: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 2 (a): Merge the strips and split them approximately in half.

18 of 33

Page 40: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 2 (b): Compute IDs of each subblock.

18 of 33

Page 41: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 3 (a): Again, merge the strips and split them approximately in half.

18 of 33

Page 42: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 3 (b): Compute IDs of each subblock.

18 of 33

Page 43: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 4 (a): Final step, merge the strips and split them approximately in half.

18 of 33

Page 44: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Step 4 (b): Final step, compute IDs of each subblock.

18 of 33

Page 45: Fast Harmonic Polynomial Transforms

The Butterfly Algorithm

Let A ∈ Rn×n have rank-proportional-to-area.

Every time we merge-and-split, the complexity of a matrix-vector productis approximately halved.

We have created a permuted and sparse block-diagonal factorization.

Costs O(kavgn2) to compute the factorization and O(kavgn log n) for a

matrix-vector product, where kavg is the average rank of all IDs.

To synthesize and analyze associated Legendre functions of all ordersrequires O(kavgn

3) flops to pre-compute and O(kavgn2 log n) to apply.

18 of 33

Page 46: Fast Harmonic Polynomial Transforms

Summary

The butterfly algorithm:

X Accelerates synthesis and analysis to O(n2 log n);

× Requires O(n3) pre-computation;

X Is stable, with errors in proportion to the tolerances in theinterpolative decompositions;

? Is fast in practice; and,

× Has low memory footprint.

Tygert’s last algorithm is widely used by practitionersincluding [Seljebotn, 2012] and [Wedi et al., 2013].

But why does it work?

And how can we decrease the memory requirements? For n = 8, 192Wavemoth requires 212GiB to store the butterfly factorizations. Forn ≈ 130, 000, the pre-computations are estimated to occupy 45TiB.

19 of 33

Page 47: Fast Harmonic Polynomial Transforms

Spherical Harmonics to Fourier

. .. ...

P0`

P1`

P2`

P3`

P`−1`

P``

......

P0`

P1`

P0`

P1`

P0`

P1`

......

T`

sin θU`

T`

sin θU`

T`

sin θU`

=⇒ =⇒

1 Convert high-order layers to layers of order 0 and 1 in O(kavgn2 log n)

flops and O(kavgn2 log n) storage; and,

2 Convert low-order layers to Fourier series in O(n2 log n) flops andO(n log n) storage a la Fast Multipole Method.

20 of 33

Page 48: Fast Harmonic Polynomial Transforms

The SH Connection Problem

Definition

Let φn(x)n≥0 be a family of orthogonal functions with respect to

L2(D, dµ(x)); and,

let ψn(x)n≥0 be another family of orthogonal functions with respect toL2(D, dµ(x)).

The connection coefficients:

c`,n =〈ψ`, φn〉dµ〈ψ`, ψ`〉dµ

,

allow for the expansion:

φn(x) =∞∑`=0

c`,nψ`(x).

21 of 33

Page 49: Fast Harmonic Polynomial Transforms

The SH Connection Problem

TheoremLet φn(x)n≥0 and ψn(x)n≥0 be two families of orthonormal functionswith respect to L2(D, dµ(x)). Then the connection coefficients satisfy:

∞∑`=0

c`,mc`,n = δm,n.

Any matrix A ∈ Rm×n, m ≥ n, with orthonormal columns iswell-conditioned and Moore–Penrose pseudo-invertible A+ = A>.

For every m, the Pm` (x) are a family of orthonormal functions for the

same Hilbert space L2([−1, 1], dx).

21 of 33

Page 50: Fast Harmonic Polynomial Transforms

The SH Connection Problem

DefinitionLet Gn denote the Givens rotation:

Gn =

1 · · · 0 0 0 · · · 0...

. . ....

......

...0 · · · cn 0 sn · · · 00 · · · 0 1 0 · · · 00 · · · −sn 0 cn · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 1

,

where the sines and the cosines are in the intersections of the nth andn + 2nd rows and columns, embedded in the identity of a conformable size.

21 of 33

Page 51: Fast Harmonic Polynomial Transforms

The SH Connection Problem

Theorem (Slevinsky [2017a])The connection coefficients between Pm+2

n+m+2(cos θ) and Pm`+m(cos θ) are:

cm`,n =

(2` + 2m + 1)(2m + 2)

√(` + 2m)!

(` + m + 12 )`!

(n + m + 52 )n!

(n + 2m + 4)!, for ` ≤ n, ` + n even,

√(n + 1)(n + 2)

(n + 2m + 3)(n + 2m + 4), for ` = n + 2,

0, otherwise.

Furthermore, the matrix of connection coefficients

C (m) = G(m)0 G

(m)1 · · ·G (m)

n−1G(m)n I(n+3)×(n+1), where the sines and cosines are:

smn =

√(n + 1)(n + 2)

(n + 2m + 3)(n + 2m + 4), and cmn =

√(2m + 2)(2n + 2m + 5)

(n + 2m + 3)(n + 2m + 4).

21 of 33

Page 52: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” W.l.o.g., consider m = 0 and n = 5.

C (0) =

0.91287 0.0 0.31623 0.0 0.17593 0.00.0 0.83666 0.0 0.39641 0.0 0.24398

−0.40825 0.0 0.70711 0.0 0.3934 0.00.0 −0.54772 0.0 0.60553 0.0 0.372680.0 0.0 −0.63246 0.0 0.5278 0.00.0 0.0 0.0 −0.69007 0.0 0.467180.0 0.0 0.0 0.0 −0.73193 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 53: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” Apply the Givens rotation G(0)>0 :

G(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 0.83666 0.0 0.39641 0.0 0.243980.0 0.0 0.7746 0.0 0.43095 0.00.0 −0.54772 0.0 0.60553 0.0 0.372680.0 0.0 −0.63246 0.0 0.5278 0.00.0 0.0 0.0 −0.69007 0.0 0.467180.0 0.0 0.0 0.0 −0.73193 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 54: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” And again:

G(0)>1 G

(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 1.0 0.0 0.0 0.0 0.00.0 0.0 0.7746 0.0 0.43095 0.00.0 0.0 0.0 0.72375 0.0 0.445440.0 0.0 −0.63246 0.0 0.5278 0.00.0 0.0 0.0 −0.69007 0.0 0.467180.0 0.0 0.0 0.0 −0.73193 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 55: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” And again:

G(0)>2 G

(0)>1 G

(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 1.0 0.0 0.0 0.0 0.00.0 0.0 1.0 0.0 0.0 0.00.0 0.0 0.0 0.72375 0.0 0.445440.0 0.0 0.0 0.0 0.68139 0.00.0 0.0 0.0 −0.69007 0.0 0.467180.0 0.0 0.0 0.0 −0.73193 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 56: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” And again:

G(0)>3 G

(0)>2 G

(0)>1 G

(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 1.0 0.0 0.0 0.0 0.00.0 0.0 1.0 0.0 0.0 0.00.0 0.0 0.0 1.0 0.0 0.00.0 0.0 0.0 0.0 0.68139 0.00.0 0.0 0.0 0.0 0.0 0.64550.0 0.0 0.0 0.0 −0.73193 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 57: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” And again:

G(0)>4 G

(0)>3 G

(0)>2 G

(0)>1 G

(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 1.0 0.0 0.0 0.0 0.00.0 0.0 1.0 0.0 0.0 0.00.0 0.0 0.0 1.0 0.0 0.00.0 0.0 0.0 0.0 1.0 0.00.0 0.0 0.0 0.0 0.0 0.64550.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 −0.76376

21 of 33

Page 58: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” And finally:

G(0)>5 G

(0)>4 G

(0)>3 G

(0)>2 G

(0)>1 G

(0)>0 C (0) =

1.0 0.0 0.0 0.0 0.0 0.00.0 1.0 0.0 0.0 0.0 0.00.0 0.0 1.0 0.0 0.0 0.00.0 0.0 0.0 1.0 0.0 0.00.0 0.0 0.0 0.0 1.0 0.00.0 0.0 0.0 0.0 0.0 1.00.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0

21 of 33

Page 59: Fast Harmonic Polynomial Transforms

The SH Connection Problem

“Proof.” Schematically:

C (m)

=

. . .

G(m)0 · · ·G (m)

n

I

Conversion between neighbouring layers is O(n) flops and storage.

The Givens rotations are computed to high relative accuracy due toanalytical expressions of sines and cosines ⇒ backward stable.

21 of 33

Page 60: Fast Harmonic Polynomial Transforms

Skeletonizing the Pre-Computation

Skeletonizing the pre-computation makes it practical for a laptop.Neighbouring layers are converted via Given rotations.

`

m

n

︸︷︷︸O(ka

vg)

22 of 33

Page 61: Fast Harmonic Polynomial Transforms

Proof That Butter Flies

For m ∈ N, the connection coefficients between P2m`+2m and P0

n are given bythe inner product:

c2m`,n =

∫ 1

−1

P2m`+2m(x)P0

n (x)dx .

Using the Fourier transform of P0n :

c2m`,n =

(−i)n√n + 1

2

π

∫Rjn(k)dk

∫ 1

−1

eikx P2m`+2m(x)dx .

The matrix of connection coefficients is an operator composition with thevariables (n× k)× (k × x)× (x × `). The Fourier integral operator is special.

23 of 33

Page 62: Fast Harmonic Polynomial Transforms

Proof That Butter Flies

Theorem (Slevinsky [2017a])

Let:

k1(n, ε) := 2

2

√π

2n + 1(n + 1)Γ(n + 3

2 )

) 1n+1

,

and let:

k2(`,m, n, ε) :=1

8

(2

ε

√2n + 1

π

√(2`+ 4m + 1)Γ(`+ 4m + 1)

Γ(`+ 1)

1

mΓ(m + 12 )

) 1m

.

Then only integration over k1(n, ε) ≤ |k | ≤ k2(`,m, n, ε) contributes to:

c2m`,n =

(−i)n√n + 1

2

π

∫Rjn(k)dk

∫ 1

−1

eikx P2m`+2m(x)dx .

to precision ε > 0.23 of 33

Page 63: Fast Harmonic Polynomial Transforms

Summary

The butterfly algorithm applied to the connection problem:

X Requires an O(n2 log2 n) run-time;

× Requires O(n3 log n) pre-computation but only 10× moreexpensive than run-time in practice;

X Is (backward) stable, with error scaling as O(√nε) for the

slow transform and O(nε) for the fast transform;

? Is fast in practice; and,

X Has low memory footprint.

24 of 33

Page 64: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms, III

Associated Legendre functions also satisfy:

−(1− x2)d

dx

[(1− x2)

d

dxPm` (x)

]+ m2Pm

` (x) = `(`+ 1)(1− x2)Pm` (x).

Expanding Pµ` (x) in the basis of Pm` (x) (for µ−m even), we find formally:

(MD + (µ2 −m2)I)u = λMu.

For large µ and m, the entries of M−1 are still semiseparable but prone tosevere (factorial) scaling. This rules out the diagonal-plus-semiseparableeigensolvers, but how can we reintroduce symmetry?The key observation of [Slevinsky, 2017b] is that multiplication by 1− x2 isa symmetric positive-definite operator and thus has a Cholesky factorizationM = R>R. Letting u = R>v and multiplying from the left by R−>, wearrive at:

(RDR> + (µ2 −m2)I)v = λRR>v .25 of 33

Page 65: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms, III

Facts about (RDR> + (µ2 −m2)I)v = λRR>v :

The entries of D are `(`+ 1) for ` ≥ |m|;The Cholesky factor R is:

R =

cm1 0 dm

1

. . .. . .

. . .

cmn−2 0 dmn−2

cmn−1 0cmn

,

where:

cm` =

√(`+ 2m)(`+ 2m + 1)

(2`+ 2m − 1)(2`+ 2m + 1), and dm

` = −

√`(`+ 1)

(2`+ 2m + 1)(2`+ 2m + 3).

Symmetric-definite tridiagonal D & C is proposed by [Borges and Gragg,1993].

25 of 33

Page 66: Fast Harmonic Polynomial Transforms

Eigenfunction Transforms, III

This symmetric-definite banded generalized eigenvalue problem:

allows for use of arrowhead divide-and-conquer algorithms to allowO(n log n) application of the connection problem; and,

allows for the pre-computation to be recursively subdivided and reducedfrom a total cost of O(n2 log n) down to O(n

32 log n), which is

superoptimal, based on O(log√n) levels in the following schematic.

0 nn2

n4

3n4

n8

3n8

5n8

7n8

n16

3n16

5n16

7n16

9n16

11n16

13n16

15n16

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

25 of 33

Page 67: Fast Harmonic Polynomial Transforms

The Story So Far

A great thrust has been made to create asymptotically fast sphericalharmonic transforms that are practical. Are they faster than the fastestslow methods of [Schaeffer, 2013, Reinecke and Seljebotn, 2013, Ishioka,2018]? Not yet.

All technologies are all similar in spirit: they divide and conquer in thepresence of oscillations on harmonic polynomials that are separableSturm–Liouville eigenfunctions of ∆ with well-separated spectra.

Is there a backward stable method? Yes [Slevinsky, 2017a].

Is there a pre-computation-free method? Yes [Slevinsky, 2017b].

Is there free and open source software? Yes,including FastTransforms.jl in Julia (experimental quality) andFastTransforms in C (production quality).

Many of the technological apparatuses have analogues for other 2Dharmonic polynomials: on the disk [Zernike, 1934], triangle [Proriol,1957], rectangle, deltoid, wedge, etc. . . , as well as spin-weightedspherical harmonics, allowing for a unified framework.

26 of 33

Page 68: Fast Harmonic Polynomial Transforms

References (1)

J. R. Driscoll and D. M. Healy Jr. Computing Fourier transforms and convolutionson the 2-sphere.

Adv. Appl. Math., 15:202–250, 1994.

J. D. McEwen and Y. Wiaux.

A novel sampling theorem on the sphere.

IEEE Trans. Sig. Proc., 59:5876–5887, 2011.

K. M. Gorski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke,and M. Bartelmann.

HEALPix: A framework for high-resolution discretization and fast analysis of datadistributed on the sphere.

Ap. J., 622:759–771, 2005.

D. Potts, G. Steidl, and M. Tasche.

Fast and stable algorithms for discrete spherical Fourier transforms.

Linear Algebra Appl., 275–276:433–450, 1998.

27 of 33

Page 69: Fast Harmonic Polynomial Transforms

References (2)

S. Kunis and D. Potts.

Fast spherical Fourier algorithms.

J. Comp. Appl. Math., 161:75–98, 2003.

R. Suda and M. Takami.

A fast spherical harmonics transform algorithm.

Math. Comp., 71:703–715, 2002.

D. M. Healy Jr., D. N. Rockmore, P. J. Kostelec, and S. Moore.

FFTs for the 2-sphere–improvements and variations.

J. Fourier Anal. Appl., 9:341–385, 2003.

M. J. Mohlenkamp.

A fast transform for spherical harmonics.

J. Fourier Anal. Appl., 5:159–184, 1999.

L. Greengard and V. Rokhlin.

A fast algorithm for particle simulations.

J. Comp. Phys., 73:325–348, 1987.

28 of 33

Page 70: Fast Harmonic Polynomial Transforms

References (3)

B. K. Alpert and V. Rokhlin.

A fast algorithm for the evaluation of Legendre expansions.

SIAM J. Sci. Stat. Comput., 12:158–179, 1991.

V. Rokhlin and M. Tygert.

Fast algorithms for spherical harmonic expansions.

SIAM J. Sci. Comput., 27:1903–1928, 2006.

S. Chandrasekaran and M. Gu.

A divide-and-conquer algorithm for the eigendecomposition of symmetricblock-diagonal plus semiseparable matrices.

Numer. Math., 96:723–731, 2004.

M. Gu and S. C. Eisenstat.

A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem.

SIAM J. Matrix Anal. Appl., 16:172–191, 1995.

M. Tygert.

Fast algorithms for spherical harmonic expansions, II.

J. Comp. Phys., 227:4260–4279, 2008.

29 of 33

Page 71: Fast Harmonic Polynomial Transforms

References (4)

M. Gu and S. C. Eisenstat.

A stable and efficient algorithm for the rank-one modification of the symmetriceigenproblem.

SIAM J. Matrix Anal. Appl., 15:1266–1276, 1994.

M. Tygert.

Fast algorithms for spherical harmonic expansions, III.

J. Comp. Phys., 229:6181–6192, 2010.

E. Michielssen and A. Boag.

A multilevel matrix decomposition algorithm for analyzing scattering from largestructures.

IEEE Trans. Antennas Propagat., 44:1086–1093, 1996.

M. O’Neil, F. Woolfe, and V. Rokhlin.

An algorithm for the rapid evaluation of special function transforms.

Appl. Comput. Harmon. Anal., 28:203–226, 2010.

30 of 33

Page 72: Fast Harmonic Polynomial Transforms

References (5)

E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert.

Randomized algorithms for the low-rank approximation of matrices.

Proc. Nat. Acad. Sci., 104:20167–20172, 2007.

D. S. Seljebotn.

Wavemoth–fast spherical harmonic transforms by butterfly matrix compression.

Astro. J. Suppl. Series, 199:12, 2012.

N. P. Wedi, M. Hamrud, and G. Mozdzynski.

A fast spherical harmonics transform for global NWP and climate models.

Monthly Weather Review, 141:3450–3461, 2013.

R. M. Slevinsky.

Fast and backward stable transforms between spherical harmonic expansions andbivariate Fourier series.

Appl. Comput. Harmon. Anal., 2017a.

31 of 33

Page 73: Fast Harmonic Polynomial Transforms

References (6)

R. M. Slevinsky.

Conquering the pre-computation in two-dimensional harmonic polynomialtransforms.

arXiv:1711.07866, 2017b.

C. F. Borges and W. B. Gragg.

A parallel divide and conquer algorithm for the generalized real symmetric definitetridiagonal eigenproblem.

In L. Reichel, A. Ruttan, and R. S. Varga, editors, Numerical Linear Algebra andScientific Computing, pages 11–29. de Gruyter, 1993.

N. Schaeffer.

Efficient spherical harmonic transforms aimed at pseudospectral numericalsimulations.

Geochem. Geophys. Geosyst., 14:751–758, 2013.

M. Reinecke and D. S. Seljebotn.

Libsharp – spherical harmonic transforms revisited.

arXiv:1303.4945, 2013.

32 of 33

Page 74: Fast Harmonic Polynomial Transforms

References (7)

K. Ishioka.

A new recurrence formula for efficient computation of spherical harmonic transform.

J. Met. Soc. Japan, 96:241–249, 2018.

F. Zernike.

Beugungstheorie des Schneidenverfahrens und seiner verbesserten Form, derPhasenkontrastmethode.

Physica, 1:689–704, 1934.

J. Proriol.

Sur une famille de polynomes a deux variables orthogonaux dans un triangle.

C. R. Acad. Sci. Paris, 245:2459–2461, 1957.

33 of 33