Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia...

141
Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria [email protected]

Transcript of Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia...

Page 1: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Introduction to Unification TheoryMatching

Temur Kutsia

RISC, Johannes Kepler University of Linz, [email protected]

Page 2: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Overview

Syntactic Matching

Advanced Topics

Page 3: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Overview

Syntactic Matching

Advanced Topics

Page 4: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching Problem

I Given: terms t and s.I Find: a substitution σ such that tσ = s (syntactic

matching).I Matching equation: t ≤·? s.I σ is called a matcher.

Page 5: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching Problem

Example

I Matching problem: f (x , y) ≤·? f (g(z), x).Matcher: σ = {x 7→ g(z), y 7→ x}.

I Matching problem: f (x , x) ≤·? f (x ,a).No matcher.

I Matching problem: f (g(x), x , y) ≤·? f (g(g(a)),g(a),b).Matcher: {x 7→ g(a), y 7→ b}.

I Matching problem: f (x) ≤·? f (g(x)).Matcher: {x 7→ g(x)}.

Page 6: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching Problem

Example

I Matching problem: f (x , y) ≤·? f (g(z), x).Matcher: σ = {x 7→ g(z), y 7→ x}.

I Matching problem: f (x , x) ≤·? f (x ,a).No matcher.

I Matching problem: f (g(x), x , y) ≤·? f (g(g(a)),g(a),b).Matcher: {x 7→ g(a), y 7→ b}.

I Matching problem: f (x) ≤·? f (g(x)).Matcher: {x 7→ g(x)}.

Page 7: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching Problem

Example

I Matching problem: f (x , y) ≤·? f (g(z), x).Matcher: σ = {x 7→ g(z), y 7→ x}.

I Matching problem: f (x , x) ≤·? f (x ,a).No matcher.

I Matching problem: f (g(x), x , y) ≤·? f (g(g(a)),g(a),b).Matcher: {x 7→ g(a), y 7→ b}.

I Matching problem: f (x) ≤·? f (g(x)).Matcher: {x 7→ g(x)}.

Page 8: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching Problem

Example

I Matching problem: f (x , y) ≤·? f (g(z), x).Matcher: σ = {x 7→ g(z), y 7→ x}.

I Matching problem: f (x , x) ≤·? f (x ,a).No matcher.

I Matching problem: f (g(x), x , y) ≤·? f (g(g(a)),g(a),b).Matcher: {x 7→ g(a), y 7→ b}.

I Matching problem: f (x) ≤·? f (g(x)).Matcher: {x 7→ g(x)}.

Page 9: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Relating Matching and Unification

I Matching can be reduced to unification.I Simply replace in a matching problem t ≤·? s each variable

in s with a new constant.I f (x , y) ≤·? f (g(z), x) becomes the unification problem

f (x , y).

=? f (g(cz), cx ).I cz , cx : new constants.I The unifier: {x 7→ g(cz), y 7→ cx}.I The matcher: {x 7→ g(z), y 7→ z}.I When t is ground, matching and unification coincide.

Page 10: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Relating Matching and Unification

I Both matching and unification can be implemented inlinear time.

I Linear implementation of matching is straightforward.I Linear implementation of unification requires sophisticated

data structures.I Whenever efficiency is an issue, matching should be

implemented separately from unification.

Page 11: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Overview

Syntactic Matching

Advanced Topics

Page 12: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching

I Matching is needed in rewriting, functional programming,querying, etc.

I Often the following problem is required to be solved:I Given a ground term s (subject) and a term p (pattern)I Find all subterms in s to which p matches.

I Notation: p �? s.I In this lecture: An algorithm to solve this problem.I Terms are represented as trees.

Page 13: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Matching

Working example:

f (f (a,X ),Y )�? f (f (a,b), f (f (a,b),a)).

Page 14: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern MatchingMatching the pattern tree to the subject tree.g

f

f

a X

Y

f

f

a X

Y

Pattern tree 1

f

f

a X

X

Pattern tree 2

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

Subject tree

f

f

a b

f

f

a a

a

Page 15: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern MatchingMatching the pattern tree to the subject tree.Pattern tree 1. First match: g

f

f

a X

Y

f

f

a X

Y

Pattern tree 1

f

f

a X

X

Pattern tree 2

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

Subject tree

f

f

a b

f

f

a a

a

Page 16: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern MatchingMatching the pattern tree to the subject tree.Pattern tree 1. Second match: g

f

f

a X

Y

f

f

a X

Y

Pattern tree 1

f

f

a X

X

Pattern tree 2

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

Subject tree

f

f

a b

f

f

a a

a

Page 17: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern MatchingMatching the pattern tree to the subject tree.Pattern tree 2. Single match:

f

f

a X

Y

f

f

a X

Y

Pattern tree 1

f

f

a X

X

Pattern tree 2

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

f

f

a b

f

f

a a

a

Subject tree

f

f

a b

f

f

a a

a

Page 18: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching

I Pattern tree 1 in the example is linear: Every variableoccurs only once.

I Pattern tree 2 is nonlinear: X occurs twice.I Two steps for nonlinear tree matching:

1. Ignore multiplicity of variables (assume the pattern in linear)and do linear tree pattern matching.

2. Verify that the substitutions computed for multipleoccurrences of a variable are identical: check consistency.

Page 19: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Terms

I V: Set of variables.I F : Set of function symbols of fixed arity.I F ∩ V = ∅.I Constants: 0-ary function symbols.I Terms:

I A variable or a constant is a term.I If f ∈ F , f is n-ary, n > 0, and t1, . . . , tn are terms, then

f (t1, . . . , tn) is a term.

Page 20: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Term Trees, Nodes, Node Labels, Edges, Edge labels

Example

f (1)

f (1)f (1)f (1)

f (2)

f (2)

a (4) X (5)

Y (3)

The tree for f (f (a,X ),Y )

NodeNode labelEdgeEdge label

1

11

2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

The tree for f (f (a,b), f (f (a,a),a))

1 2

1 2 1 2

1 2

Page 21: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Term Trees, Nodes, Node Labels, Edges, Edge labels

Example

f (1)

f (1)

f (1)f (1)

f (2)

f (2)

a (4) X (5)

Y (3)

The tree for f (f (a,X ),Y )

Node

Node labelEdgeEdge label

1

11

2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

The tree for f (f (a,b), f (f (a,a),a))

1 2

1 2 1 2

1 2

Page 22: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Term Trees, Nodes, Node Labels, Edges, Edge labels

Example

f (1)f (1)

f (1)

f (1)

f (2)

f (2)

a (4) X (5)

Y (3)

The tree for f (f (a,X ),Y )Node

Node label

EdgeEdge label

1

11

2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

The tree for f (f (a,b), f (f (a,a),a))

1 2

1 2 1 2

1 2

Page 23: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Term Trees, Nodes, Node Labels, Edges, Edge labels

Example

f (1)f (1)f (1)

f (1)

f (2)

f (2)

a (4) X (5)

Y (3)

The tree for f (f (a,X ),Y )NodeNode label

Edge

Edge label

1

1

1

2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

The tree for f (f (a,b), f (f (a,a),a))

1 2

1 2 1 2

1 2

Page 24: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Term Trees, Nodes, Node Labels, Edges, Edge labels

Example

f (1)

f (1)f (1)f (1)

f (2)

f (2)

a (4) X (5)

Y (3)

The tree for f (f (a,X ),Y )NodeNode labelEdge

Edge label

11

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

The tree for f (f (a,b), f (f (a,a),a))

1 2

1 2 1 2

1 2

Page 25: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Labeled Path

I Labeled path lp(n1,nq) in a term tree from the node n1 tothe node nq:A string formed by alternatively concatenating the nodeand edge labels from n1 to nq.

Page 26: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Labeled Path

Examplef (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

Labeled path from 1 to 8: lp(1,8) = f2f1f1a

1 2

1 2 1 2

1 2

Page 27: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 28: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

1

12124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 29: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

1

12

124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 30: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112

124

124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 31: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124

1242

12425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 32: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

1121241242

12425

124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 33: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425

124252

124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 34: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252

1242521

12425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 35: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

1121241242124251242521242521

12425213

124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 36: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213

124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731ffafbffffafaffaff

Page 37: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler chain for a term tree: a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213

124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731

ffafbffffafaffaff

Page 38: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Properties of Euler chains a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731

ffafbffffafaffaff

Page 39: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Properties of Euler chains a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731

ffafbffffafaffaff

Page 40: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Properties of Euler chains a string of node labels obtainedas follows:

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131

ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731

ffafbffffafaffaff

Page 41: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Euler Chains and Strings

I Euler strings: Replace nodes in Euler chains with nodelabels.

f (1)

f (2)

f (2)

a (4)

a (4)

X (5)

X (5)

Y (3)

Y (3)

112124124212425124252124252112425213124252131

The leaves occur only once:

124252131

The subchain between thefirst and last occurrence

of a node:The chain of the subtree

rooted at that node:

124252131

A node with n childrenoccurs n + 1 times

124252131

ffafXffYf

1 2

1 2

f (1)

f (2)

a (4) b (5)

f (3)

f (6)

a (8) a (9)

a (7)

1 2

1 2 1 2

1 2

12425213686963731

ffafbffffafaffaff

Page 42: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching: Idea

I Instead of using the tree structure, the algorithm operateson Euler chains and Euler strings.

I To declare a match of the pattern tree at a subtree of thesubject tree, the algorithm

I verifies whether their Euler strings are identical afterreplacing the variables in the pattern by Euler strings ofappropriate terms.

I To justify this approach, Euler strings have to be related tothe tree structures.

TheoremTwo term trees are equivalent (i.e. they represent the sameterm) iff their corresponding Euler strings are identical.

Page 43: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching: Idea

I Instead of using the tree structure, the algorithm operateson Euler chains and Euler strings.

I To declare a match of the pattern tree at a subtree of thesubject tree, the algorithm

I verifies whether their Euler strings are identical afterreplacing the variables in the pattern by Euler strings ofappropriate terms.

I To justify this approach, Euler strings have to be related tothe tree structures.

TheoremTwo term trees are equivalent (i.e. they represent the sameterm) iff their corresponding Euler strings are identical.

Page 44: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching: Idea

I Instead of using the tree structure, the algorithm operateson Euler chains and Euler strings.

I To declare a match of the pattern tree at a subtree of thesubject tree, the algorithm

I verifies whether their Euler strings are identical afterreplacing the variables in the pattern by Euler strings ofappropriate terms.

I To justify this approach, Euler strings have to be related tothe tree structures.

TheoremTwo term trees are equivalent (i.e. they represent the sameterm) iff their corresponding Euler strings are identical.

Page 45: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Tree Pattern Matching: Idea

I Instead of using the tree structure, the algorithm operateson Euler chains and Euler strings.

I To declare a match of the pattern tree at a subtree of thesubject tree, the algorithm

I verifies whether their Euler strings are identical afterreplacing the variables in the pattern by Euler strings ofappropriate terms.

I To justify this approach, Euler strings have to be related tothe tree structures.

TheoremTwo term trees are equivalent (i.e. they represent the sameterm) iff their corresponding Euler strings are identical.

Page 46: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Nonlinear Tree Pattern Matching: Ideas

Putting the ideas together:1. Ignore multiplicity of variables (assume the pattern is

linear) and do linear tree pattern matching.2. Verify that the substitutions computed for multiple

occurrences of a variable are identical: check consistency.3. Instead of trees, operate on their Euler strings.

Page 47: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Notation

I s: Subject tree.I p: Pattern tree.I Cs and Es: Euler chain and Euler string for the subject tree.I Cp and Ep: Euler chain and Euler string for the pattern tree.I n: Size of s.I m: Size of p.I k : Number of variables in p.I K : The set of all root-to-variable-leaf pathes in p.

Page 48: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

I Let v1, . . . , vk be the variables in p.I v1, . . . , vk appear only once in Ep, because

I only leaves are labeled with variables,I each leaf appears exactly once in the Euler string, andI each variable occurs exactly once in p (linearity).

Page 49: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

We start with a simple algorithm.I Es is stored in an array.

I Split Ep into k + 1 strings, denoted σ1, . . . , σk+1, byremoving variables.

I ffafXffYf splits into σ1 = ffaf , σ2 = ff , and σ3 = f .I Construct Boolean tables M1, . . . ,Mk , each having |Es|

entries:

Mi [j] =

{1 if there is a match for σi in Es starting at pos. j0 otherwise.

Page 50: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

We start with a simple algorithm.I Es is stored in an array.I Split Ep into k + 1 strings, denoted σ1, . . . , σk+1, by

removing variables.

I ffafXffYf splits into σ1 = ffaf , σ2 = ff , and σ3 = f .I Construct Boolean tables M1, . . . ,Mk , each having |Es|

entries:

Mi [j] =

{1 if there is a match for σi in Es starting at pos. j0 otherwise.

Page 51: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

We start with a simple algorithm.I Es is stored in an array.I Split Ep into k + 1 strings, denoted σ1, . . . , σk+1, by

removing variables.I ffafXffYf splits into σ1 = ffaf , σ2 = ff , and σ3 = f .

I Construct Boolean tables M1, . . . ,Mk , each having |Es|entries:

Mi [j] =

{1 if there is a match for σi in Es starting at pos. j0 otherwise.

Page 52: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

We start with a simple algorithm.I Es is stored in an array.I Split Ep into k + 1 strings, denoted σ1, . . . , σk+1, by

removing variables.I ffafXffYf splits into σ1 = ffaf , σ2 = ff , and σ3 = f .

I Construct Boolean tables M1, . . . ,Mk , each having |Es|entries:

Mi [j] =

{1 if there is a match for σi in Es starting at pos. j0 otherwise.

Page 53: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

Example

I Ep = ffafXffYf, σ1 = ffaf, σ2 = ff, σ3 = f,Es = ffafbffffafaffaff.

I M1 = 10000001000010000 (ffafbffffafaffaff).

I M2 = 10000111000010010 (ffafbffffafaffaff).

I M3 = 11010111100011011 (ffafbffffafaffaff).

Page 54: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I We start from M1 = 10000001000010000.

I The set of nodes where p matches s is a subset of the set ofnodes with nonzero entries in M1.

I Take a nonzero entry position i in M1 that corresponds to the firstoccurrence of a node in the Euler chain, i = 1.

Page 55: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I We start from M1 = 10000001000010000.

I The set of nodes where p matches s is a subset of the set ofnodes with nonzero entries in M1.

I Take a nonzero entry position i in M1 that corresponds to the firstoccurrence of a node in the Euler chain, i = 1.

Page 56: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I We start from M1 = 10000001000010000.

I The set of nodes where p matches s is a subset of the set ofnodes with nonzero entries in M1.

I Take a nonzero entry position i in M1 that corresponds to the firstoccurrence of a node in the Euler chain, i = 1.

Page 57: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The replacement for X must be a string in Es that starts atposition i + |σ1| = 5

I Moreover, this position must correspond to the first occurrenceof a node in the Euler chain, because

I variables can be substituted by subtrees only,I a subtree starts with the first occurrence of a node in the

Euler chain.

If this is not the case, take another nonzero entry position in M1.

Page 58: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The replacement for X must be a string in Es that starts atposition i + |σ1| = 5

I Moreover, this position must correspond to the first occurrenceof a node in the Euler chain, because

I variables can be substituted by subtrees only,I a subtree starts with the first occurrence of a node in the

Euler chain.

If this is not the case, take another nonzero entry position in M1.

Page 59: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 60: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 61: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 62: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 63: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 64: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 65: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I Replacement for X is a substring of Es between the first and lastoccurrences of the node at position i + |σ1|.

I Let j be the position of the last occurrence from the previousitem. Then M2[j + 1] should be 1: σ2 should match Es at thisposition.

I And proceed in the same way...

Page 66: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 67: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 68: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 69: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 70: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 71: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 72: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 73: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 74: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 75: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 76: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt

gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 77: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 78: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 79: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt gives X→ a,Y→ a.

I One more try...

fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 80: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Step 1. Linear Tree Pattern Matching

p = f(f(a,X),Y) s = f(f(a,b),f(f(a,a),a))Cp = 124252131 Cs = 12425213686963731Ep = ffafXffYf Es = ffafbffffafaffaffσ1 = ffaf M1 = 10000001000010000σ2 = ff M2 = 10000111000010010σ3 = f M3 = 11010111100011011

I The first match found: X→ b,Y→ f(f(a,a),a).

I The next attempt gives X→ a,Y→ a.

I One more try... fail.

I The last 1 in Cs is not the first occurrence of 1.

Page 81: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

I The simple algorithm computes k + 1 Boolean tables.I Each table has |Es| = n size.I In total, construction of the tables takes O(nk) time.I Room for improvement: Do not compute them explicitly.

Page 82: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Suffix Number, Suffix Index

Ψ: finite set of strings.I Suffix number of a string λ in Ψ: The number of strings in

Ψ which are suffixes of λ.I Suffix index of Ψ (denoted Ψ∗): The maximum among all

suffix numbers of strings in Ψ.I If |Ψ| = 0 then Ψ∗ = 1.

Example

I Ψ = {ffffX , ffffb, fffb, ffb, fb}. |Ψ| = 5.I Suffix number of ffffX in Ψ is 1.I Suffix number of fffb in Ψ is 3.I Suffix number of ffffb in Ψ is 4.I Suffix index of Ψ is 4.

Page 83: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 84: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.

I If X at node i in p matches a subtree at node w in s thenI w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 85: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 86: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 87: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 88: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).

I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 89: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.

I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 90: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern Matching

How many replacements at most are possible (independent ofthe algorithm)?

I Assume p matches s at some node.I If X at node i in p matches a subtree at node w in s then

I w is called a legal replacement for X ,I path1 ◦ lp(rp, i ′) = lp(rs,w). (i ′: i labeled with (lab(w)).)

I If another variable Y at node j in p matches w (in anothermatch) then

I path2 ◦ lp(rp, j ′) = lp(rs,w).I Therefore, lp(rp, j ′) is a suffix of lp(rp, i ′), or vice versa.I Hence, the subtree at w can be substituted at most K ∗

times over all matches and the number of all legalreplacements that can be computed over all matches isO(nK ∗).

Page 91: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingBound on the number of replacements computed by the simplealgorithm:

I Assume e1, . . . ,ew are Euler strings of subtrees in s rootedat nodes i1, . . . , iw .

I Assume the string σ1 ◦ e1 ◦ · · · ◦ ew ◦ σw+1 matches asubstring of Es at position l .

I l is the position that corresponds to the first occurrence ofa node j in s, i.e. Cs[l] = j and Cs[l ′] 6= j for all l ′ < l .

I For each 1 < q < w , lp(rp, vq) = lp(j , iq) (v ’s are thecorresponding variable nodes in p.)

I The strings σ1, σ1 ◦ e1, σ1 ◦ e1 ◦ σ2, . . . are computedincrementally.

I We have a match at j if we compute σ1 ◦ e1 ◦ · · · ◦ ek ◦ σk+1,i.e. legal replacements for all variables in p.

Page 92: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingBound on the number of replacements computed by the simplealgorithm:

I In case of failed match attempt, we would have computedat most one illegal replacement.

I Hence, the total number of illegal replacements computedover match attempts at all nodes can be O(n) at most.

I Therefore, the upper bound of the replacements computedby the algorithm is O(nK ∗).

That’s fine, but how to keep the time-bound of the algorithm pro-portional to the number of replacements?

Page 93: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingBound on the number of replacements computed by the simplealgorithm:

I In case of failed match attempt, we would have computedat most one illegal replacement.

I Hence, the total number of illegal replacements computedover match attempts at all nodes can be O(n) at most.

I Therefore, the upper bound of the replacements computedby the algorithm is O(nK ∗).

That’s fine, but how to keep the time-bound of the algorithm pro-portional to the number of replacements?

Page 94: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Do not spend more than O(1) between replacements,without computing the tables explicitly.

I Do a replacement in O(1).I Doing a replacement in O(1) is easy:

I Store an Euler string in an array along with a pointer fromthe first occurrence of a node to its last occurrence.

I Check whether the replacement begins at the firstoccurrence of a node.

I If yes, skip to its last occurrence.I Not spending more than O(1) between replacements,

without computing the tables, needs more preprocessing.

Page 95: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Do not spend more than O(1) between replacements,without computing the tables explicitly.

I Do a replacement in O(1).

I Doing a replacement in O(1) is easy:I Store an Euler string in an array along with a pointer from

the first occurrence of a node to its last occurrence.I Check whether the replacement begins at the first

occurrence of a node.I If yes, skip to its last occurrence.

I Not spending more than O(1) between replacements,without computing the tables, needs more preprocessing.

Page 96: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Do not spend more than O(1) between replacements,without computing the tables explicitly.

I Do a replacement in O(1).I Doing a replacement in O(1) is easy:

I Store an Euler string in an array along with a pointer fromthe first occurrence of a node to its last occurrence.

I Check whether the replacement begins at the firstoccurrence of a node.

I If yes, skip to its last occurrence.

I Not spending more than O(1) between replacements,without computing the tables, needs more preprocessing.

Page 97: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Do not spend more than O(1) between replacements,without computing the tables explicitly.

I Do a replacement in O(1).I Doing a replacement in O(1) is easy:

I Store an Euler string in an array along with a pointer fromthe first occurrence of a node to its last occurrence.

I Check whether the replacement begins at the firstoccurrence of a node.

I If yes, skip to its last occurrence.I Not spending more than O(1) between replacements,

without computing the tables, needs more preprocessing.

Page 98: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I What happens in the steps preceding the replacement forvi+1, after computing a replacement for vi?

I Determine whether pattern string σi+1 matches Es at theposition following the replacement for vi .

I Had we computed the tables, this can be done in O(1), buthow to achieve the same without the tables?

Page 99: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I What happens in the steps preceding the replacement forvi+1, after computing a replacement for vi?

I Determine whether pattern string σi+1 matches Es at theposition following the replacement for vi .

I Had we computed the tables, this can be done in O(1), buthow to achieve the same without the tables?

Page 100: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I What happens in the steps preceding the replacement forvi+1, after computing a replacement for vi?

I Determine whether pattern string σi+1 matches Es at theposition following the replacement for vi .

I Had we computed the tables, this can be done in O(1), buthow to achieve the same without the tables?

Page 101: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 102: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:

I Preprocess the pattern strings to produce an automatonthat recognizes every instance of these k + 1 strings.

I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 103: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.

I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 104: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.

I With every position in an array containing Es, store the stateof the automaton on reading the symbol in that position.

I In order to decide whether a pattern string σi matches thesubstring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 105: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.

I In order to decide whether a pattern string σi matches thesubstring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 106: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 107: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 108: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Complexity of Linear Tree Pattern MatchingKeep the time-bound of the algorithm proportional to thenumber of replacements:

I Problem: Given a position in Es and a string σi ,1 ≤ i ≤ k + 1, decide in O(1) whether σi matches Es inthat position.

I Idea:I Preprocess the pattern strings to produce an automaton

that recognizes every instance of these k + 1 strings.I Use the automaton to recognize these strings in Es.I With every position in an array containing Es, store the state

of the automaton on reading the symbol in that position.I In order to decide whether a pattern string σi matches the

substring of Es at position j , look at the state of theautomaton in position j + |σi | − 1.

I Lookup from the array in O(1) time.

How to preprocess the pattern strings?

Page 109: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

Pattern string preprocessing:I The k + 1 pattern strings σ1, . . . , σk+1 are preprocessed to

produce an automation that recognizes every instance ofthese strings.

I Method: Aho-Corasick (AC) algorithm.I The AC algorithm constructs the desired automaton in time

proportional to the sum of the lengths of all pattern strings.

Page 110: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

What does a Aho-Corasick automaton for a set of patternstrings σ1, . . . , σk+1 do?

I Takes the subject string Es as input.I Outputs the locations in Es at which the σ’s appear as

substrings, together with the corresponding σ’s.I For example, a Aho-Corasick automaton for the strings

he, she, his, hers returns on the input string ushers thelocations 4 (match for she and he) and 6 (match for hers).

Page 111: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

Aho-Corasick automatonI consists of a set of states, represented by numbers,I processes the subject string by successively reading

symbols in it, making state transitions and occasionallyemitting output,

I is controlled by three functions:1. a goto function g,2. a failure function f ,3. a output function output .

Page 112: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

Construction of the Aho-Corasick automation:I Determine the states and the goto function.I Compute the failure function.I Computation of the output function begins on the first step

and is completed on the second.

Page 113: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0

1 2h e

output(2) = {he}output(5) = {she

,}he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 114: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}

output(5) = {she

,}he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 115: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 116: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}

output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 117: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s

¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 118: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 119: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0

f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 120: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0

f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 121: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0

f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 122: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 123: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0

f (7) = 3f (5) = 2f (9) = 3

Page 124: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she

,

}

he}

output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3

f (5) = 2f (9) = 3

Page 125: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she,

}

he}output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2

f (9) = 3

Page 126: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleConstruction of the Aho-Corasick automation for the pattern stringshe, she, his, hers.

The goto function g : states × letters → states ∪ {fail} :

0 1 2h e

output(2) = {he}output(5) = {she,

}

he}output(7) = {his}output(9) = {hers}

3 4 5s h e

6 7i s

8 9r s¬h, s

f (1) = f (3) = 0f (2) = 0f (6) = 0f (4) = 1

f (8) = 0f (7) = 3f (5) = 2f (9) = 3

Page 127: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

ExampleAC automation for the pattern strings hha, h, ba.

0 1 2 3

4 5

¬h,b h h a

b a

output ′(1) = {h}output ′(3) = {hha}output ′(5) = {ba}

output ′(2) = {h} f (1) = 0f (4) = 0f (2) = 1

f (5) = 0f (3) = 0

1,3,5 : primary accepting states.2 : secondary accepting state (h is a suffix of hh).

Page 128: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

I For each secondary accepting state there is a uniqueprimary accepting state with exactly the same output set.

I Modify construction of AC automaton by maintainingpointers from secondary accepting states to thecorresponding accepting states.

Page 129: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

I Construction of Aho-Corasick automaton takes O(m) time.I The output set is represented as a linked list, which is

inappropriate for our purpose.I Given an arbitrary string, we want to determine in constant

time whether it is in the output set.

I Idea: Copy the output set into an array.I Question: How many elements do we have to copy?I Answer: As many as in the output sets of all primary

accepting states, which is O(m) (because any string in aprimary accepting state is a suffix of the longest string inthis state.)

Page 130: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

I Construction of Aho-Corasick automaton takes O(m) time.I The output set is represented as a linked list, which is

inappropriate for our purpose.I Given an arbitrary string, we want to determine in constant

time whether it is in the output set.I Idea: Copy the output set into an array.

I Question: How many elements do we have to copy?I Answer: As many as in the output sets of all primary

accepting states, which is O(m) (because any string in aprimary accepting state is a suffix of the longest string inthis state.)

Page 131: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

I Construction of Aho-Corasick automaton takes O(m) time.I The output set is represented as a linked list, which is

inappropriate for our purpose.I Given an arbitrary string, we want to determine in constant

time whether it is in the output set.I Idea: Copy the output set into an array.I Question: How many elements do we have to copy?

I Answer: As many as in the output sets of all primaryaccepting states, which is O(m) (because any string in aprimary accepting state is a suffix of the longest string inthis state.)

Page 132: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

I Construction of Aho-Corasick automaton takes O(m) time.I The output set is represented as a linked list, which is

inappropriate for our purpose.I Given an arbitrary string, we want to determine in constant

time whether it is in the output set.I Idea: Copy the output set into an array.I Question: How many elements do we have to copy?I Answer: As many as in the output sets of all primary

accepting states, which is O(m) (because any string in aprimary accepting state is a suffix of the longest string inthis state.)

Page 133: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Modifying the Linear Tree Pattern Matching

Linear Tree Pattern Matching:1. Construct Aho-Corasick automaton for the pattern strings.2. Visit each primary accepting state and copy its output set

into a boolean array.3. Scan the Es with this automaton.4. During this process, with each entry in Es store the state of

automaton upon reading the function symbol in that entry.5. If this state is a secondary accepting state, instead of it

store the corresponding primary accepting state.6. To determine whether there is a match for a pattern string

σ at the position i requires verifying the state associatedwith the i + |σ| − 1’th entry in Es:

I This should be a primary accepting state andI Its output set should contain σ.

Page 134: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Consistency Checking

For nonlinear patterns the computed replacements have to bechecked for consistency.

I Idea: Assign integer codes (from 1 to n) to the nodes in thesubject tree.

I Two nodes get the same encoding iff the subtrees rootedat them are identical.

I Such an encoding can be computed in O(n).

Page 135: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Consistency Checking

Computing the encoding:I Bottom up: First, sort the leaves with respect to their labels

and take the ranks as the integers for encoding. Duplicatesare assigned the same rank.

I Suppose the encoding for all nodes up to the height i iscomputed.

I Computing the encoding of the nodes at height i + 1:I Assign to each node v at the level i + 1 a vector〈f , j1, . . . , jn〉.

I f is the label of v and ji is the encoding of its i ’s child.I The vectors assigned to all nodes at i + 1 are radix sorted.I If the rank of v is α and the largest encoding among the

nodes at level i is β, then the encoding for v is α + β.

Page 136: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Consistency Checking

Checking consistency:I Consistency of replacements is checked as they are

computed.I For each variable in the pattern, the encoding for the

replacement of its first occurrence is computed and isentered into a table.

I For the next occurrence of the same variable, compareencoding of its replacement to the one in the table.

I If the check succeeds, proceed further. Otherwise report afailure and start matching procedure at another position inEs.

I These steps do not increase the complexity of thealgorithm.

Page 137: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

The Last Word

Nonlinear tree pattern matching can be done in O(nK ∗) time.

Page 138: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Example

f (f (a,X ),X )

f

f

a X

X

Pattern tree p

f (f (a,b), f (f (a,a),a))

f

f

a b

f

f

a a

a

Subject tree s

Page 139: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Example (Cont.)

I Ep = ffafXffXf

I AC automation for the pattern strings ffaf, ff, f.

0 1 2 3 4¬f f f a f

output(1) = {f}output(2) = {ff,f}output(4) = {ffaf,f}failure(1) = failure(3) = 0failure(2) = failure(4) = 1

ffaf ff fO1 F F TO2 F T TO4 T F T

Page 140: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Example (Cont.)σ1 = ffaf, σ2 = ff, σ3 = f

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17-------------------------------------------

Cs 1 2 4 2 5 2 1 3 6 8 6 9 6 3 7 3 1Es f f a f b f f f f a f a f f a f fIsFirst T T T F T F F T T T F T F F T F Flastptr 17 6 3 - 5 - - 16 13 8 - 9 - - 15 - -IsLast F F T F T T F F F T F T T F T T Tstate 1 2 3 4 0 1 2 2 2 3 4 0 2 2 3 4 2

I In the first row, the numbers from 1 to 17 - array indices.

I Cs and Es - the Euler chain and the Euler string for s.

I For an index i,

I IsFirst [i] = T iff Cs[i] occurs first time in Cs.I if IsFirst [i] = T then lastptr [i] = j where j is the index of

the last occurrence of the number Cs[i] in Cs.I IsLast [i] = T iff Cs[i] occurs last time in Cs.I state[i] is the state of the automaton after reading Es[i].

Page 141: Introduction to Unification Theory · Introduction to Unification Theory Matching Temur Kutsia RISC, Johannes Kepler University of Linz, Austria kutsia@risc.jku.at

Reference

R. Ramesh and I. V. Ramakrishnan.Nonlinear pattern matching in trees.J. ACM, 39(2):295–316, 1992.