Stable components and layers...Stable components and layers J.F. Jardine Department of Mathematics...

17
Stable components and layers J.F. Jardine * Department of Mathematics University of Western Ontario [email protected] October 21, 2019 Abstract Component graphs Γ0(F ) are defined for arrays of sets F , and in particu- lar for arrays of path components for Vietoris-Rips complexes and Lesnick complexes. The path components of Γ0(F ) are the stable components of the array F . The stable components for the system of Lesnick complexes {L s,k (X)} for a finite data set X decompose into layers, which are them- selves path components of a graph. Combinatorial scoring functions are defined for layers and stable components. Keywords: clusters, graphs, stable components, layers Subject Classifications: 55U10, 68R10, 62H30 Introduction Astronomers say that a cluster is a “group of stars or galaxies forming a rela- tively close association”. * Supported by NSERC. 1

Transcript of Stable components and layers...Stable components and layers J.F. Jardine Department of Mathematics...

  • Stable components and layers

    J.F. Jardine∗

    Department of MathematicsUniversity of Western Ontario

    [email protected]

    October 21, 2019

    Abstract

    Component graphs Γ0(F ) are defined for arrays of sets F , and in particu-lar for arrays of path components for Vietoris-Rips complexes and Lesnickcomplexes. The path components of Γ0(F ) are the stable components ofthe array F . The stable components for the system of Lesnick complexes{Ls,k(X)} for a finite data set X decompose into layers, which are them-selves path components of a graph. Combinatorial scoring functions aredefined for layers and stable components.

    Keywords: clusters, graphs, stable components, layers

    Subject Classifications: 55U10, 68R10, 62H30

    Introduction

    Astronomers say that a cluster is a “group of stars or galaxies forming a rela-tively close association”.

    ∗Supported by NSERC.

    1

  • Clusters are distinguished by relative density: they are concentrated collectionsof objects, surrounded by voids.

    In the late 1700s, the brother and sister team William and Caroline Her-schel found and classified “stellar over-densities” by counting stars in grids ofregions of space. The method used today follows the same principle, althoughit is done with sophisticated imaging equipment coupled with computer anal-ysis that filters out light artifacts. Relative to big picture items such as thecosmic microwave background, clusters are anomalies — they are small, densecollections of stellar objects.

    The same sort of big picture/little picture dichotomy is present for otherlarge data sets. In financial data, a dense, relatively small collection of rapidlow-value transactions could point to an instance of money laundering, while alarge scale analysis can detect sectoral or global market fluctuations.

    Colloquially, clusters are collections of data points in relative close proximity,within some space that is defined by a finite list of parameters. We shall assume,more precisely, that a data set (or a data cloud) X is a finite collection of pointsinside real vector space Rn. The motivating idea behind clustering is to findregions in the ambient space that contain dense populations of elements of thedata set X. This is usually done in data analysis by applying partitioningmethods to the data set X, or by studying hierarchies of such partitions.

    The definitional structure and methods of this paper represent a departurefrom traditional clustering, although partitions and hierarchies (really, trees) ofpartitions are both used. The central objects of study are partition elements(or path components, or clusters) that persist through changes of distance pa-rameters, or density parameters, or both — these objects are called stable com-ponents. The idea is to assign a more precise meaning to the colloquial versionof clusters, such as clusters of stars, which are isolated groupings of objects ina data set.

    To this end, we recall some basic constructions of topological data analysisin the first section of this paper. Specifically, the data set X determines anascending sequence of simplicial complexes Vs(X), the Vietoris-Rips complexes,in which the simplices consist of sets of data points having mutual distancebouned by a non-negative real number s. These complexes, in turn, are filteredby Lesnick subcomplexes Ls,k(X), where the filtration is determined by a densityparameter k.

    Each of these complexes has a functorial set of path components π0Vs(X),respectively π0Ls,k(X), which partition subsets of the data set X. The result-ing diagrams of partitions have associated hierarchies Γ(V∗(X)) and Γ(L∗,∗(X)),containing component graphs Γ0(π0V∗(X)) and Γ0(L∗,∗(X)), respectively. Thecomponent graphs consist of path components that do not change through somevariation in the defining parameters s and/or k. The component graphs them-selves have path components, and these are the stable components for the dataset X, in various incarnations.

    Stable components further decompose into layers, for filtrations derived fromthe Lesnick filtration Ls,k(X), in which the underlying sets of vertices may not

    2

  • be constant. The layers of the filtration L∗,∗(X) are defined by computing pathcomponents of the layer subgraph Γ′0(Ls,k(X)) of the component graph. Theedges of the layer subgraph are defined by path components that have the samesize (or cardinality) through changes of defining parameters.

    Layers are defined and discussed in the second section of this paper. Everystable component is a disjoint union of its constituent layers, while the lay-ers form graphs that are relatively easy to visualize geometrically as unions ofsquares.

    Naive calculations of stable components and layers produce large collectionsof small objects. In particular, the individual elements of X are stable com-ponents for the Vietoris-Rips filtration. It is typical to interpret small stablecomponents or layers as noise, and remove them from the output of a particularalgorithm. This can be done with a “scoring” technique, for which noise objectscan be interpreted as stable components having low scores. That said, one couldbe most interested in stable components having relatively low scores, such as inalgorithms that detect money laundering or smaller scale star clusters.

    Scoring appears as an analytic device in statistical approaches to clustering— see [5], for example. An alternative combinatorial method of scoring is pre-sented in the third section of this paper. Basically, the score σ(P ) of a stablecomponent or the score σ(L) of a layer L is the sum of the cardinalities of thepath components that appear in its list of vertices. This number is most effec-tively calculated by making yet another graph out of the vertices of the Lesnickcomplexes Ls,k(X) and computing the cardinality of the set of vertices of apullback of a stable component or layer within this new graph. This method ofscoring is additive, so that the score of a stable component is the sum of thescores of its constituent layers.

    An simple theoretical problem is presented in the final section of this paper.We start with a data set X ⊂ Rn and add a point y which is very close to oneof the elements of X to form a new data set Y = X t{y}, and then we want tocompare the stable components for X and Y .

    Suitably interpreted, the inclusion i : X ⊂ Y induces maps of Vietoris-Rips complexes Vs(X)→ Vs(Y ) which are natural with respect to the distanceparameter s, with a corresponding map of hierarchies

    Γ(X) := Γ(V∗(X))i∗−→ Γ(V∗(Y )) =: Γ(Y ).

    The language of layers is used to compare stable components with respectto this map. In this context, a layer of X can be viewed as an edge

    (s, [x]s)→ (t, [x]t)

    of Γ(X), such that [x]s = [x]t as subsets of X and a maximality condition onthe length t− s is satisfied. Here, [x]s is the path component of x in Vs(X).

    The map Γ(X) → Γ(Y ) breaks up layers of X having both y /∈ [x]s andy ∈ [x]t in Vs(Y ) and Vt(Y ), respectively. Otherwise, layers of X are mappedto “partial” layers of Y . These partial layers expand to layers of Y , having a

    3

  • size that can be approximated. The situation is summarized in Proposition 15below.

    More precise answers to the question of how layers and their scores varythrough the hierarchy comparison i∗ : Γ(X) → Γ(Y ) would be available inspecific examples.

    The hierarchy Γ(Y ) is a refinement of Γ(X), and the map i∗ is a deformationretract in a very strong sense by the homotopy interleaving methods of [1].Statements of this form give coarse information about layers, while the preciselayer structure of a data set depends strongly on the distances between its points.Adding a single point to a data set X can introduce many new distances betweenpoints of the resulting data set Y .

    This paper was partially conceived and written during a series of visits tothe Tutte Institute, and I would like to thank the Insitute for its hospitality andsupport. This research was also partially supported by the Natural Sciencesand Engineering Research Council of Canada.

    I would like to thank the referee for raising the question that is the subjectof the final section of this paper. This question is quite natural, and the collec-tion of ideas behind the solution that is presented here has proved to be quiteprovocative.

    Contents

    1 Clusters, graphs and stable components 4

    2 Layers 8

    3 Scoring 11

    4 Adding a point 13

    1 Clusters, graphs and stable components

    Clustering is a long-standing enterprise of data analysis, which consists of manydefinitions and methods. Basically, the standard techniques amount to eitherpartitioning a data set, or constructing hierarchies of partitions.

    For example, K-means clustering starts with a set of points in a data set.This set of points partitions the data set into regions of nearest neighbours, orVoronoi cells. The algorithm proceeds by finding centres of the cells, and thenfinding nearest neighbour sets for this set of centres to produce a new set ofregions. The procedure stops when the set of centres stabilizes. The Voronoicells determined by the resulting set of stable points partition the data set, andthe partitions are expected to contain dense regions of data points near theircentres.

    4

  • Hierarchical clustering algorithms, such as single linkage clustering, requirefewer assumptions. The inductive step in single linkage clustering assumes theexistence of a partition

    X = P1 t · · · t Pk ⊂ Rn

    of the data set X. Find subsets Pi and Pj which are closest together in theambient space Rn, and then form a coarser partition by taking the union of Q =Pi∪Pj while keeping the other partition subsets fixed. There is a correspondingfunction which relates the first partition to the new one. The algorithm typicallystarts with the discrete partition X = tx∈X {x}, and the last possible step inthe resulting hierarchy of partitions would be the singleton partition consistingof X alone. The algorithm is typically stopped when it reaches a partition ofX with sufficiently many points in some partition members, where the phrase“sufficiently many” is open to interpretation. The diagram of partitions andrelations between them forms a dendogram, which is a type of tree.

    The methods of topological data analysis produce multiple variations of thesegeneral themes.

    Suppose that X ⊂ Rn is a finite data set, and choose a non-negative realnumber s.

    The Vietoris-Rips complex Vs(X) is a simplicial complex with simplicesconsisting of sets {x0, . . . , xk} of points of X such that d(xi, xj) ≤ s for all i, j.The vertices of Vs(X) are the elements of X, and the set of path componentsπ0Vs(X) of Vs(X) defines a partition of the data set X.

    Explicitly, elements x, y ∈ X are in the same path component of Vs(X) ifthere is a series of “short hops” (length ≤ s) from x to y through elements of X.We define an equivalence relation on X in this way, and the equivalence classesare the path components of Vs(X). Calculation of the set of path componentsof Vs(X) can be done with an algorithm.

    If s ≤ t then there is an induced inclusion of simplicial complexes

    Vs(X) ⊂ Vt(X),

    and a corresponding function π0Vs(X) → π0Vt(X) that relates partitioningsgiven by the respective sets of path components. Because X is finite, there areonly finitely many numbers

    0 = s0 < s1 < · · · < sp

    that I call phase change numbers, which can occur as distances between elementsof X. In the corresponding string of inclusions

    X = Vs0(X) ⊂ Vs1(X) ⊂ · · · ⊂ Vsp(X),

    the complex Vs0(X) is the discrete set X, while Vsp(X) is a big simplex, whichI write as ∆X =: ∆N , where N + 1 is the number of elements of X. The spaceVsp(X) = ∆

    X is contractible.The contractibility of Vsp(X) implies that its set of path components π0Vsp(X)

    is a singleton set, and I express this by writing π0Vsp(X) = ∗.

    5

  • Remark 1. One way to produce the complexes Vsi(X), altogether, is to firstfind all distances si between data points, and to determine the maximum dis-tance between elements for all subsets σ = {x0, x1, . . . , xp} of X. This maximumdistance for a subset σ is one of the si, and so σ is a simplex of Vsi(X). Thisconstruction is simple enough, but note the exponential complexity [7].

    The corresponding picture

    X = π0Vs0(X)→ π0Vs1(X)→ · · · → π0Vsp(X) = ∗ (1)

    of surjective functions between partitions defines a tree by a method specifiedbelow (Remark 4), and as such defines a hierarchical clustering.

    The Vietoris-Rips complex Vs(X) can be filtered by density. Suppose thatk is a non-negative number. Then the complex Vs(X) has a “full” subcomplexLs,k(X), which I call a Lesnick complex, whose simplices consist of verticeshaving at least k neighbours relative to the parameter s. See also [5].

    The Lesnick complexes Ls,k(X), k ≥ 0 filter the Vietoris-Rips complexVs(X), and changing either the distance parameter s or the density parame-ter k defines an array of inclusions

    Ls,k(X) // Lt,k(X)

    Ls,k+1(X)

    OO

    // Lt,k+1(X)

    OO(2)

    of simplicial complexes.Lesnick says that the array {Ls,k(X)} is the degree Rips filtration of the

    Vietoris-Rips system {Vs(X)}, [4].

    Example 2. The partitioning algorithm DBSCAN* (“Density based spatialclustering of applicatons with noise”) amounts to a calculation of π0Ls,k(X),for fixed tunable distance parameter s and density parameter k.

    Example 3. The “hierarchical” version HDBSCAN* [2], [5] of DBSCAN* isthe production (and interpretation) of a tree that is associated to the string offunctions

    π0Ls0,k(X)→ π0Ls1,k(X)→ · · · → π0Lsp,k(X) = ∗. (3)

    Here, Lsi,k(X) could be empty for small si, and we are assuming that k isbounded above by the cardinality of X. There is one tunable parameter in thiscase, namely the density k.

    Starting with the data set X ⊂ Rn as above, we continue to examine thestring of functions

    X = π0Vs0(X)→ π0Vs1(X)→ · · · → π0Vsp(X) = ∗

    as in (1), but with a different interpretation.

    6

  • These functions together determine a graph Γ(X), whose vertices are pairs(si, [x]), where [x] (sometimes [x] = [x]i or [x] = [x]si , for precision) is a pathcomponent of a data point x ∈ X in the complex Vsi(X). The edges of thegraph have the form (si, [x])→ (si+1, [x]). The path component represented byx in Vsi(X) maps to the path component represented by x in Vsi+1(X) underthe function π0Vsi(X)→ π0Vsi+1(X), as is standard.

    The graph Γ(X) is the hierarchy graph for π0V∗(X). This graph is a tree,and is the hierarchical clustering arising from the complexes Vs(X), but we gofurther.

    The vertex (si, [x]) is a branch point if, equivalently,

    1) there are distinct path components [u], [v] of Vsi−1(X) such that [u] =[v] = [x] in π0Vsi(X), or

    2) the inclusion [x]i−1 ⊂ [x]i of path components is not surjective.

    Remove all edges that terminate in branch points from the hierarchy graphΓ(X), to produce a subgraph Γ0(X), called the component graph. The pathcomponents of the graph Γ0(X) the stable components of the data set X.

    Remark 4. These definitions can be generalized to arbitrary strings of functions

    F : F0α−→ F1

    α−→ . . . α−→ Fp.

    The hierarchy graph Γ(F ) has vertices (i, x) with x ∈ Fi, and has edges

    (i, x)→ (i+ 1, α(x)).

    I say that the vertex (i, x) is a branch point of Γ(F ) if there are distinct elementsy, z ∈ Fi−1 such that α(y) = α(z) = x. Remove all edges terminating in branchpoints from the graph Γ(F ) to form the component graph Γ0(F ), and then thestable components of F are the path components of the graph Γ0(F ).

    At this level of generality, the hierarchy graph Γ(F ) is a disjoint union oftrees, one for each element of the set Fp.

    Example 5. The hierarchy graph Γ(π0L∗,k(X)) that is associated to the stringof functions (3) is the tree of the HDBSCAN* algorithm.

    The stable components for the string (3), meaning the path components ofthe component graph Γ0(π0L∗.k(X)), are said to be clusters in [5, Sec.2.3].

    Applying the path component construction π0 to the array of simplicialcomplexes

    Lsi,k(X)// Lsi+1,k(X)

    Lsi,k+1(X)

    OO

    // Lsi+1,k+1(X)

    OO

    7

  • produces an array of functions

    π0Lsi,k(X) // π0Lsi+1,k(X)

    π0Lsi,k+1(X) //

    OO

    π0Lsi+1,k+1(X)

    OO(4)

    This array is finite, and it is a special case of a finite array of functions Fhaving the form

    F0,0α // F1,0 // . . .

    F0,1 α//

    βOO

    F1,1 //βOO

    . . .

    ...

    OO

    ...

    OO

    (5)

    where Fi,k = π0Lsi,k(X).There is again a “hierarchy” graph Γ(F ) with vertices ((m, k), x) with x ∈

    Fm,k, and edges

    ((m, k), x)→ ((m+ 1, k), α(x)) and ((m, k + 1), y)→ ((m, k), β(y)),

    called horizontal and vertical edges, respectively.The vertex ((m, k), x) is a horizontal branch point if there are two elements

    y, z ∈ Fm−1,k such that α(y) = α(z) = x. Similarly, ((m, k), x) is a verticalbranch point if there are elements u, v ∈ Fm,k+1 such that β(u) = β(v) = x.

    Remove all horizontal edges terminating in horizontal branch points andremove all vertical edges terminating in vertical branch points from the graphΓ(F ), to form the component graph Γ0(F ). Observe that the graphs Γ(F ) andΓ0(F ) have the same vertices. The path components π0Γ0(F ) are the stablecomponents of the array F .

    Example 6. The set of vertices for the complex Ls,k(X) is empty for k ≥ m,for some m. Holding s fixed and letting k vary gives a string of functions

    ∅ = π0Ls,m(X)→ π0Ls,m−1(X)→ · · · → π0Ls,0(X) = π0Vs(X).

    The corresponding graph Γ(π0Ls,∗(X)) is a disjoint union of trees, which is theanalogue of the cluster tree of a density function [6] for the present context.The cluster tree is also a type of hierarchy graph. See also [3].

    2 Layers

    Suppose again that X ⊂ Rn is a finite data set, and consider the ascendingsequence of Vietoris-Rips complexes

    X = Vs0(X) ⊂ Vs1(X) ⊂ · · · ⊂ Vsp(X) (6)

    8

  • that is associated to the phase change numbers 0 = s0 < s1 < s2 < · · · < sp.Form the associated string of functions

    X = π0Vs0(X)→ π0Vs2(X)→ · · · → π0Vsp(X) = ∗

    between path components.Given an edge (si, [x]si) → (si+1, [x]si+1) in the associated hierarchy graph

    Γ(F ) = Γ(X), the vertex (si+1, [x]si+1) is not a branch point if and only if thecardinalities of the path components [x]si ∈ π0Vsi(X) and [x]si+1 ∈ π0Vsi+1(X)coincide. This means that [x]si = [x]si+1 as subsets of X.

    It follows that a stable component for the sequence F = {π0Vsj (X)} is astring of edges

    P : (i, [x]i)→ (i+ 1, [x]i+1)→ · · · → (i+ n, [x]i+n)

    in Γ(F ) of maximal length such that all components [x]j ∈ π0Vsj (X) coincideas subsets of the data set X.

    Each stable component P as above contains a unique branch point, namelythe vertex (i, [x]i) at the beginning of the string. It follows that stable compo-nents of X can be identified with the branch points of the hierarchy Γ(F ) =Γ(X).

    Fatten up the sequence (6) of Vietoris-Rips complexes to the array {Lsi,k(X)}of Lesnick complexes, write Fi,k = π0Lsi,k(X), and form the graph Γ(F ) as inthe last section.

    Form a subgraph Γ′0(F ) of Γ(F ) by saying that

    ((i, k), [x])→ ((i+ 1, k), [x]) or((i, k), [z])→ ((i, k − 1), [z])

    (7)

    is an edge of Γ′0(F ) if and only if the path components [x] ∈ π0Li,k(X) and[x] ∈ π0Li+1,k(X) (respectively, [z] ∈ π0Li,k(X) and [z] ∈ π0Li,k−1(X)) coincideas subsets of X.

    Remark 7. If [z] = [z]i,k is path component of Li,k(X), then z represents apath component [z]i,k−1 of Li,k−1(X), and there is an inclusion [z]i,k ⊂ [z]i,k−1of subsets of X. If the element ((i, k− 1), [z]) = ((i, k− 1), [z]i,k−1) is a verticalbranch point, then [z]i,k−1 contains a path component from Li,k(X) in additionto [z]i,k, and so [z]i,k 6= [z]i,k−1 as subsets of X.

    Similar considerations hold in the horizontal case: if [x]i,k = [x]i+1,k assubsets of X, then ((i+ 1, k), [x]i+1,k) cannot be a horizontal branch point.

    The graph Γ′0(F ) is the layer graph for F . The edges of Γ′0(F ) are edges of

    the graph Γ(F ) for which the size of path component subsets is preserved.If either of the edges in (7) is in the layer subgraph Γ′0(F ), then the target

    in each case cannot be a horizontal (respectively) vertical branch point, becauseof the preservation of size of path components. It follows that the layer graphΓ′0(F ) is a subgraph of the component graph Γ0(F ).

    9

  • Suppose that ((s, k), [x]) is a vertex of Γ′0(F ), and suppose given a path

    P : ((t0, k0), [y0])→ · · · → ((tn, kn), [yn])) = ((s, k), [x])

    in Γ′0(F ) which terminates at ((s, k), [x]). Then x ∈ X is in all path components[yi] ∈ π0Lti,ki(X) since these subsets of X are constant through the path, sothat we can rewrite the path P as

    P : ((t0, k0), [x])→ · · · → ((tn, kn), [x]) = ((s, k), [x]).

    In general, the path component L of a vertex ((s, k), [x]) in Γ′0(F ) consistsof vertices of the form ((t, r), [x]).

    For a vertex ((t, r), [x]) and a path P as above, if (t, r) satisfies t0 ≤ t ≤tn = s and k = kn ≤ r ≤ k0, then x ∈ Lt0,k0(X) so that x ∈ Lt,r(X), and xrepresents a path component [x] for the three complexes

    Lt0,k0(X) ⊂ Lt,r(X) ⊂ Ltn,kn(X).

    It follows that the three path components represented by x coincide, so that((t, r), [x]) is in the path component of ((s, k), [x]).

    It follows that every path P in the layer graph Γ′(F ) generates a squaresq(P ) of vertices ((t, r), [x]) with t0 ≤ t ≤ s and k ≤ r ≤ k0, and this square liesin the path component of ((s, k), [x]).

    Suppose now that L is a path component in Γ′0(F ) of a vertex ((t, r), [y]).Find the elements ((si, ki), [y]), i = 1, . . . , n, of the component L which aremaximal in si and minimal in ki. Find all pathsQ1, . . . , Qn in L which terminatein one of the ((si, ki), [y]), and which are maximal in the sense that they cannotbe extended to longer paths. Then the path component L in Γ′0(F ) is a unionof the squares associated to these maximal paths, in the sense that

    L = ∪ni=1 sq(Qi). (8)

    We have a graph inclusion Γ′0(F ) ⊂ Γ0(F ), and both graphs have the samevertices. It follows that each path component (stable component) P of thecomponent graph Γ0(F ) is a disjoint union of components of the graph Γ

    ′0(F ),

    meaning thatP = tj Lj ,

    where the subsets Lj are path components of Γ′0(F ) that are contained in P . In

    other words, each stable component is a disjoint union of layers. At the sametime, we know from (8) that each layer Lj is a union of squares.

    These observations, taken together, give a geometric picture of the stablecomponents for the Lesnick filtration {Ls,k(X)} of a data set X.

    Example 8. The distinction between stable components and layers potentiallyappears in all subdiagrams of the array {Lsi,k(X)} for which the complexesinvolved do not share a common set of vertices.

    Such is the case for the diagram of complexes

    Ls0,k(X) ⊂ Ls1,k(X) ⊂ · · · ⊂ Lsp,k(X)

    10

  • which produces the HDBSCAN* algorithm of Example 3. The stable compo-nents (or clusters of [5]) break up into disjoint unions of layers in this case, butthis decomposition remains to be interpreted.

    3 Scoring

    Continue with the data set X ⊂ Rn, suppose again that the distinct lengthsbetween elements of the finite set X have the form

    0 = s0 < s1 < · · · < sp

    and form the array of Lesnick complexes {Lsi,k(X)}. These complexes are sub-complexes of the largest Vietoris-Rips complex Vsp(X), which can be identifiedwith a large simplex ∆N , where X has N + 1 elements.

    Write Ksi,k for the set of vertices of the complex Lsi,k(X). Then there is anarray of inclusions of vertices {Ksi,k} and a collection of surjective functions

    p : Ksi,k → π0Lsi,k(X). (9)

    These functions p are natural in i and k, and together define a map of arrays.All sets Ksi,k are subsets of the data set X. The sets Ksi,0 coincide with X.

    The function p of (9) is defined by p(x) = [x], where [x] is the path componentrepresented by the vertex x. The subset p−1([x]) of Ksi,k is the set of membersof the path component [x] of the complex Lsi,k(X).

    The array of sets Ksi,k defines a graph Γ(K∗,∗) as before. Vertices are pairs((si, k), y) with y ∈ Ksi,k, and there are horizontal and vertical edges havingthe respective forms

    ((si, k), y)→ ((si+1, k), y) and ((si, k), y)→ ((si, k − 1), y).

    The functions p of (9) define a graph homomorphism

    p : Γ(K∗,∗)→ Γ(π0L∗,∗(X))

    which is defined on vertices by p((si, k), y) = ((si, k), [y]).Given x ∈ X, there is a subgraph Γx(K) ⊂ Γ(K∗,∗) which has vertices

    ((sj , k), x) with x ∈ Ksj ,k. Each subgraph Γx(K) is connected, and the sub-graphs Γx(K) are the connected components of Γ(K∗,∗). It follows that thereis a graph isomorphism ⊔

    x∈K0

    Γx(K)∼=−→ Γ(K∗,∗).

    Let P be a stable component for the diagram π0L∗,∗(X), and form thepullback diagram

    P ∩ Γ(K∗,∗) //

    ��

    Γ(K∗,∗)

    p

    ��P

    i// Γ(π0L∗,∗(X))

    (10)

    11

  • where i is the composite inclusion P ⊂ Γ0(π0L∗,∗(X)) ⊂ Γ(π0L∗,∗(X)) ofgraphs.

    There is a graph isomorphism⊔x∈X

    P ∩ Γx(K)∼=−→ P ∩ Γ(K∗,∗)

    while the set of vertices (P ∩ Γ(K∗,∗)0 of the graph P ∩ Γ(K∗,∗) is a disjointunion of the sets p−1((si, k), [y])) ⊂ K(si, k) with ((si, k), [y]) ∈ P .

    It is standard to write |F | for the number of elements in a finite set F . Itfollows that there is an identity∑

    x∈X|(P ∩ Γx(K))0| =

    ∑((si,k),[y])∈P

    |[y]|, (11)

    where (P ∩ Γx(K))0 is the set of vertices of the graph P ∩ Γx(K). The number

    ζ(x, P ) = |(P ∩ Γx(K))0|

    is the number of vertices in P which are represented by x. It is a combinatorialstability measure of x with respect to P . This description of ζ(x, P ) is adaptedfrom the stability measure for x of [5], which is an analytic invariant.

    The combinatorial persistence score σ(P ) of a stable component P is thesum of all stability measures ζ(x, P ):

    σ(P ) =∑x∈K0

    ζ(x, P ),

    so thatσ(P ) =

    ∑x∈X

    |(P ∩ Γx(K))0| =∑

    ((si,k),[y])∈P

    |[y]|.

    on account of the identity (11).Analogs of diagram (10) lead to similar analyses for all subobjects of the

    graph Γ(π0L∗,∗(X)), including

    1) layers for Γ(π0L∗,∗(X)),

    2) stable components and layers for the graph Γ(π0L∗,k(X)) (HDBSCAN*case), and

    3) stable components (which are also layers) for the graph π0L∗,0(X) =π0K∗(X).

    If the subobject L ⊂ Γ(π0L∗,∗(X)) is a layer, then we have a pullbackdiagram

    L ∩ Γ(K∗,∗) //

    ��

    Γ(K∗,∗)

    p

    ��L

    i// Γ(π0L∗,∗(X))

    12

  • where i is the inclusion of the layer L. Then

    L ∩ Γ(K∗,∗) =⊔x∈X

    L ∩ Γx(K).

    In this case, either L ∩ Γ(K∗,∗) = ∅, or the intersection has a vertex ((si, k), x)such that x represents every vertex of L. It follows that

    |(L ∩ Γ(K∗,∗)0| = |L0| · |[x]|,

    where x is any choice of representative for a vertex ((si, k), [x]) of L, and L0 isthe set of vertices of the layer L. This is consistent with counting cardinalitesof the fibres p−1(si, k), [x]), as dictated by the right hand side of equation (11).

    The score σ(L) of a layer L then has the rather simple form

    σ(L) =∑x∈X

    |(L ∩ Γx(K))0| =∑

    ((si,k),[y])∈L

    |[y]| = |L0| · |[x]|.

    Note finally that since a stable component P is a disjoint union

    P = L1 t · · · t Lk

    and scoring for P and L amounts to counting fibres for the graph map p, thenthere is a relation

    σ(P ) =

    k∑i=1

    σ(Li)

    which relates the score of P to the scores of its constituent layers.

    4 Adding a point

    Suppose that Y = X t{y} and that d(y, x0) < r for some x0 ∈ X. The numberr should be tiny — this will be made precise later.

    In this section, we compare stable components for the hierarchies Γ(X) =Γ(V∗(X)) and Γ(Y ) = Γ(V∗(Y )) arising from the respective Vietoris-Rips com-plexes.

    The comparison requires a more versatile approach: we expand the systemsV∗(X) and V∗(Y ) (that are indexed by phase change numbers in previous sec-tions) to functors

    s 7→ Vs(X) and s 7→ Vs(Y ),

    respectively, for numbers s in an interval [0, R], where R is an upper bound forthe phase change numbers of both X and Y .

    We are then entitled to simplicial set maps i : Vs(X) ⊂ Vs(Y ) that arenatural in s ∈ [0, R], which are induced by the inclusion X ⊂ Y . The interestingbehaviour of both of these systems, in isolation, occurs at the respective phaseshift numbers, but a robust comparison mechanism requires all parameters.

    13

  • The functor s 7→ Vs(X) defines a set-valued functor s 7→ π0Vs(X), andan expanded hierarchy Γ(X) can be defined by analogy with what we haveabove: Γ(X) is a graph (or a category) with vertices (s, [x]s) with s ∈ [0, R]and [x]s ∈ π0Vs(X), and has edges (s, [x]s) → (t, [x]t) with s ≤ t. We shallsometimes write [x]s,X for [x]s where comparisons of data sets are involved.

    An inclusion of data sets X ⊂ Y then induces a morphism of graphs

    i : Γ(X)→ Γ(Y ).

    For such comparisons, it is convenient to phrase the discussion in terms oflayers. The layers of Γ(X) can be identified with edges L : (s, [x]s) → (t, [x]t)of Γ(X) such that

    1) s and t are phase shift numbers of X,

    2) [x]s = [x]t as subsets of X, and

    3) the length t − s of the edge L is maximal with respect to the first twoconditions.

    The vertex (s, [x]s) at the initial vertex of a layer must be a branch point, andthe full collection of layers of Γ(X) can be identified with it set Br(X) of branchpoints. The layers of Γ(X) are the path components (the stable components)of the component graph Γ0(X) of the first section.

    A partial layer for X is an edge (s, [x]s)→ (t, [x]t) of Γ(X) such that [x]s =[x]t. In other words, a partial layer is an edge which satisfies only condition 2)above.

    Returning to the situation X ⊂ Y = X t {y} of interest, write

    0 = s0 < s1 < · · · < sk

    for the list of phase change numbers (i.e. distances between points) for X.We suppose that y is close to an element x0 ∈ X, in the sense that d(y, x0) <

    r, wherer < si+1 − si (12)

    for all i ≥ 0. The inequality (12) is the meaning of the requirement that thenumber r should be tiny.

    Lemma 9. Suppose that x ∈ X has the property that y is not a member of thepath component [x]s,Y of Vs(Y ).

    Then the inclusion [x]s,X ⊂ [x]s,Y is a bijection.

    Proof. Suppose that x′ ∈ [x]s,Y . Then y /∈ [x]s,Y , so there is a path

    x = y0 ↔ y1 ↔ · · · ↔ yr = x′

    such that all yi ∈ X. It follows that x′ ∈ [x]s,X .

    14

  • Corollary 10. Suppose that s ≤ t and that y /∈ [x]t,Y in Vt(Y ).If (s, [x]s,X)→ (t, [x]t,X) is a partial layer for X then (s, [x]s,Y )→ (t, [x]t,Y )

    is a partial layer for Y .

    Note that if y /∈ [x]t,Y then y /∈ [x]s,Y , because [x]s,Y ⊂ [x]t,Y for s ≤ t.

    Proof of Corollary 10. There is a commutative diagram of inclusions

    [x]s,X= //

    =

    ��

    [x]t,X

    =

    ��[x]s,Y // [x]t,Y

    in which the vertical maps are bijections by Lemma 9. The claim follows.

    Remark: There could be layers (s, [x]s,X) → (t, [x]t,X) of X such that y /∈[x]s,Y and y ∈ [x]t,Y . In that case, the corresponding edge (s, [x]s,Y )→ (t, [x]t,Y )cannot be a partial layer of Y because the set [x]t,Y is strictly larger than [x]s,Y .

    In general, if L is a simplicial complex and V is a set of vertices of L, thefull subcomplex of L on the set V consists of those simplices σ of L having allof their vertices in the set V .

    Write Vs(X)(y) for the full subcomplex of Vs(X), on the set of those verticesx such that [x]s,Y = [y]s,Y in Vs(Y ).

    Lemma 11. Suppose that si ≤ s < si+1 − r for all i ≥ 1. Then we have thefollowing:

    1) The space Vs(X)(y) is connected.

    2) The function π0Vs(X)→ π0Vs(Y ) is a bijection.

    Proof. 1) Suppose that x is a vertex of Vs(X)(y). Then there is a sequence of1-simplices

    x = y0 ↔ y1 ↔ · · · ↔ yn ↔ y

    in Vs(Y ) with yi ∈ X. Then d(y, x0) < r, so d(yn, x0) < s+ r and [x] = [x0] inVs+r(X). But Vs(X) = Vs+r(X) since si ≤ s < s+r < si+1, and so [x] = [x0] inVs(X). This is true for all vertices x ∈ Vs(X)(y), so that Vs(X)(y) is connected.

    2) Suppose that x1, x2 are vertices of Vs(X) such that [y] 6= [x1] and [y] 6= [x2]in Vs(Y ). Then it follows from Lemma 9 that [x1]s,X = [x2]s,X in Vs(Y ) if andonly if [x1]s,Y = [x2]s,Y in Vs(X).

    The function i∗ : π0Vs(X) → π0Vs(Y ) is surjective since d(y, x0) < r <s1 ≤ s, so that [y]s,Y = [x0]s,Y . The previous paragraphs imply that i∗ isinjective.

    Corollary 12. Under the assumptions for Lemma 11, the map i∗ : π0Vsi(X)→π0Vsi(Y ) is a bijection at all phase change numbers si for X such that i ≥ 1.

    15

  • Corollary 13. Suppose that the edge (s, [x]s,X) → (t, [x]t,X) is a partial layerof X. Suppose that y ∈ [x]s,Y and that s and t are phase change numbers forX, with s1 ≤ s ≤ t.

    Then the edge (s, [x]s,Y )→ (t, [x]t,Y ) of Γ(Y ) is a partial layer of Y .

    Proof. The spaces Vs(X)(y) and Vt(X)(y) are connected by Lemma 11. Itfollows that

    [x]s,Y = {y} t (Vs(X)(y))0 = {y} t [x]s,X= {y} t [x]t,X = {y} t (Vt(Y )(t))0 = [x]t,Y

    in Y .

    Corollary 14. Suppose that (s, [x]s,Y )→ (t, [x]t,Y ) is a partial layer of Y suchthat y ∈ [x]s,Y and x 6= y. Suppose that s, t are phase change numbers for Xwith s1 ≤ s ≤ t.

    Then the edge (s, [x]s,X)→ (t, [x]t,X) is a partial layer of X.

    Proof. The function Vs(X)(y)0 → Vt(X)(y)0 is a bijection, since [x]s,Y → [x]t,Yis a bijection, and the spaces Vs(X)(y) and Vt(X)(y) are path connected byLemma 11. It follows that the function [x]s,X → [x]t,X is a bijection.

    Suppose that L : (s, [x]s,X)→ (t, [x]t,X) is a layer of X such that either y ∈[x]s,Y or y /∈ [x]t,Y and s1 ≤ s. In both cases, the edge (s, [x]s,Y ) → (t, [x]t,Y )is a partial layer for Y , by Corollary 13 and Corollary 10, respectively.

    Suppose that s′ and t′ are phase change numbers of X such that

    s1 ≤ s′ ≤ s ≤ t ≤ t′

    and such that (s′, [x]s′,Y ) → (t′, [x]t′,Y ) is a partial layer of Y . Then we havethe following:

    1) If y ∈ [x]s,Y then y ∈ [x]s′,Y . The maps [x]s′,X → [x]s,X and [x]t,X → [x]t′,Xare bijections on account of the connectedness assertion of Lemma 11, and itfollows that (s′, [x]s′,X) → (t′, [x]t′,X) is a partial layer of X. But L is a layer,so that s′ = s and t = t′ by the maximality condition.

    2) Suppose that y /∈ [x]t,Y . Then y /∈ [x]t′,Y since [x]t,Y = [x]t′,Y . Then theedge (s′, [x]s′,X) → (t′, [x]t′,X) is a partial layer of X, since the diagram offunctions

    [x]s′,X //

    =

    ��

    [x]t′,X

    =

    ��[x]s′,Y =

    // [x]t′,Y

    commutes, as in the proof of Corollary 10. The maximality condition for thelayer L again implies that s′ = s and t = t′.

    We have proved the following result:

    16

  • Proposition 15. Suppose that L : (si, [x]si,X) → (sj , [x]sj ,X) is a layer of Xwith s1 ≤ si. Suppose that either y ∈ [x]si,Y or y /∈ [x]sj ,Y .

    Then the edge L : (si, [x]si,Y ) → (sj , [x]sj ,Y ) is a partial layer for Y , andthe layer (s, [x]s,Y ) → (t, [x]t,Y ) containing L must have si−1 < s ≤ si andsj ≤ t < sj+1.

    The numbers s and t in the statement of Proposition 15 are phase changenumbers for Y . One can show that the phase change numbers for Y lie in theopen intervals (si−r, si+r) centred at the phase change numbers si for X. Thisis a constraint on the position of the numbers s and t above, but the differencessi − si−1 and sj+1 − sj could be large, and there is no theoretical information,for example, on whether s is close to si or close to si−1.

    References

    [1] Andrew J. Blumberg and Michael Lesnick. Universality of the homotopyinterleaving distance. CoRR, abs/1705.01690, 2017.

    [2] Ricardo J. G. B. Campello, Davoud Moulavi, Arthur Zimek, and JörgSander. Hierarchical density estimates for data clustering, visualization,and outlier detection. TKDD, 10(1):5:1–5:51, 2015.

    [3] Gunnar Carlsson and Facundo Memoli. Multiparameter hierarchical clus-tering methods. In Proceedings of the 11th International Federation of Clas-sification Societies (IFCS) biennial conference and 33rd annual conferenceof the Gesellschaft für Classifikation e.V., Dresden, Germany, 2009, pages63–70, 01 2010.

    [4] M. Lesnick and M. Wright. RIVET: visualization and analysis of two-dimensional persistent homology. http://rivet.online, 2019.

    [5] Leland McInnes and John Healy. Accelerated hierarchical density basedclustering. In 2017 IEEE International Conference on Data Mining Work-shops, ICDM Workshops 2017, New Orleans, LA, USA, November 18-21,2017, pages 33–42, 2017.

    [6] Werner Stuetzle. Estimating the cluster tree of a density by analyzing theminimal spanning tree of a sample. J. Classification, 20(1):025–047, 2003.

    [7] Afra Zomorodian. Fast construction of the Vietoris-Rips complex. Computerand Graphics, pages 263–271, 2010.

    17