Download - Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Transcript
Page 1: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Learning Problemsin Theorem Proving

Cezary Kaliszyk

Universitat Innsbruck

July 3, 2017LAIVe Summer School

Page 2: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Computer Theorem Proving: Historical Context

• 1940s: Algorithmic proof search (λ-calculus)

• 1960s: de Bruijn’s Automath

• 1970s: Small Certifiers (LCF)

• 1990s: Resolution (Superposition)

• 2000s: Large theories

• 2010s: ?

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 2/72

Page 3: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Lecture content

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 3/72

Page 4: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 4/72

Page 5: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

The Kepler Conjecture (year 1611)

The most compact way ofstacking balls of the same size inspace is a pyramid.

V =π√18≈ 74%

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 5/72

Page 6: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 6/72

Page 7: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

The Kepler Conjecture (year 1611)

Proved in 1998• Tom Hales, 300 page proof using computer programs

• Submitted to the Annals of Mathematics

• 99% correct. . . but we cannot verify the programs

1039 equalities and inequalities

For example:

−x1x3−x2x4+x1x5+x3x6−x5x6++x2(−x2+x1+x3−x4+x5+x6)√√√√4x2

(x2x4(−x2+x1+x3−x4+x5+x6)++x1x5(x2−x1+x3+x4−x5+x6)++x3x6(x2+x1−x3+x4+x5−x6)−−x1x3x4−x2x3x5−x2x1x6−x4x5x6

) < tan(π

2− 0.74)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 7/72

Page 8: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

The Kepler Conjecture (year 1611)

Proved in 1998• Tom Hales, 300 page proof using computer programs

• Submitted to the Annals of Mathematics

• 99% correct. . . but we cannot verify the programs

1039 equalities and inequalities

For example:

−x1x3−x2x4+x1x5+x3x6−x5x6++x2(−x2+x1+x3−x4+x5+x6)√√√√4x2

(x2x4(−x2+x1+x3−x4+x5+x6)++x1x5(x2−x1+x3+x4−x5+x6)++x3x6(x2+x1−x3+x4+x5−x6)−−x1x3x4−x2x3x5−x2x1x6−x4x5x6

) < tan(π

2− 0.74)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 7/72

Page 9: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

The Kepler Conjecture (year 1611)

Solution? Formalized Proof!• Formalize the proof using Proof Assistants

• Implement the computer code in the system

• Prove the code correct

• Run the programs inside the Proof Assistant

Flyspeck Project

• Completed 2015

• Many Proof Assistants and contributors

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 8/72

Page 10: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

What is a Proof Assistant? (1/2)

A Proof Assistant is a• a computer program• to assist a mathematician• in the production of a proof• that is mechanically checked

What does a Proof Assistant do?• Keep track of theories, definitions, assumptions• Interaction - proof editing• Proof checking• Automation - proof search

What does it implement? (And how?)

• a formal logical system intended as foundation for mathematics• decision procedures

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 9/72

Page 11: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

WKHRUHP�VTUW�BQRWBUDWLRQDO����VTUW��UHDO��������SURRI��DVVXPH��VTUW��UHDO����������WKHQ�REWDLQ�P�Q����QDW�ZKHUH����QBQRQ]HUR���Q�X����DQG�VTUWBUDW���hVTUW��UHDO���h� �UHDO�P���UHDO�Q�����DQG�ORZHVWBWHUPV���JFG�P�Q� ��������IURP�QBQRQ]HUR�DQG�VTUWBUDW�KDYH��UHDO�P� �hVTUW��UHDO���h� �UHDO�Q��E\�VLPS��WKHQ�KDYH��UHDO��Pt�� ��VTUW��UHDO����t� �UHDO��Qt������E\��DXWR�VLPS�DGG��SRZHU�BHTBVTXDUH���DOVR�KDYH���VTUW��UHDO����t� �UHDO����E\�VLPS��DOVR�KDYH������ �UHDO��Pt�� �UHDO���� �Qt���E\�VLPS��ILQDOO\�KDYH�HT���Pt� ��� �Qt������KHQFH����GYG�Pt������ZLWK�WZRBLVBSULPH�KDYH�GYGBP�����GYG�P��E\��UXOH�SULPHBGYGBSRZHUBWZR���WKHQ�REWDLQ�N�ZKHUH��P� ��� �N������ZLWK�HT�KDYH���� �Qt� ��t� �Nt��E\��DXWR�VLPS�DGG��SRZHU�BHTBVTXDUH�PXOWBDF���KHQFH��Qt� ��� �Nt��E\�VLPS��KHQFH����GYG�Qt������ZLWK�WZRBLVBSULPH�KDYH����GYG�Q��E\��UXOH�SULPHBGYGBSRZHUBWZR���ZLWK�GYGBP�KDYH����GYG�JFG�P�Q��E\��UXOH�JFGBJUHDWHVW���ZLWK�ORZHVWBWHUPV�KDYH����GYG����E\�VLPS��WKXV�)DOVH�E\�DULWKTHG

de Bruijn factor

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 10/72

Page 12: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

WKHRUHP�VTUW�BQRWBUDWLRQDO����VTUW��UHDO��������SURRI��DVVXPH��VTUW��UHDO����������WKHQ�REWDLQ�P�Q����QDW�ZKHUH����QBQRQ]HUR���Q�X����DQG�VTUWBUDW���hVTUW��UHDO���h� �UHDO�P���UHDO�Q�����DQG�ORZHVWBWHUPV���JFG�P�Q� ��������IURP�QBQRQ]HUR�DQG�VTUWBUDW�KDYH��UHDO�P� �hVTUW��UHDO���h� �UHDO�Q��E\�VLPS��WKHQ�KDYH��UHDO��Pt�� ��VTUW��UHDO����t� �UHDO��Qt������E\��DXWR�VLPS�DGG��SRZHU�BHTBVTXDUH���DOVR�KDYH���VTUW��UHDO����t� �UHDO����E\�VLPS��DOVR�KDYH������ �UHDO��Pt�� �UHDO���� �Qt���E\�VLPS��ILQDOO\�KDYH�HT���Pt� ��� �Qt������KHQFH����GYG�Pt������ZLWK�WZRBLVBSULPH�KDYH�GYGBP�����GYG�P��E\��UXOH�SULPHBGYGBSRZHUBWZR���WKHQ�REWDLQ�N�ZKHUH��P� ��� �N������ZLWK�HT�KDYH���� �Qt� ��t� �Nt��E\��DXWR�VLPS�DGG��SRZHU�BHTBVTXDUH�PXOWBDF���KHQFH��Qt� ��� �Nt��E\�VLPS��KHQFH����GYG�Qt������ZLWK�WZRBLVBSULPH�KDYH����GYG�Q��E\��UXOH�SULPHBGYGBSRZHUBWZR���ZLWK�GYGBP�KDYH����GYG�JFG�P�Q��E\��UXOH�JFGBJUHDWHVW���ZLWK�ORZHVWBWHUPV�KDYH����GYG����E\�VLPS��WKXV�)DOVH�E\�DULWKTHG

de Bruijn factorC. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 10/72

Page 13: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Intel Pentium R© P5 (1994)

Superscalar; Dual integer pipeline; Faster floating-point, ...

4 159 835

3 145 727= 1.333820...

4 159 835

3 145 727P5= 1.333739...

FPU division lookup table: for certain inputs division result off

Replacement

• Few customers cared, still cost of $475 million

• Testing and model checking insufficient:• Since then Intel and AMD processors formally verified• HOL Light and ACL2 (along other techniques)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 11/72

Page 14: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Intel Pentium R© P5 (1994)

Superscalar; Dual integer pipeline; Faster floating-point, ...

4 159 835

3 145 727= 1.333820...

4 159 835

3 145 727P5= 1.333739...

FPU division lookup table: for certain inputs division result off

Replacement

• Few customers cared, still cost of $475 million

• Testing and model checking insufficient:• Since then Intel and AMD processors formally verified• HOL Light and ACL2 (along other techniques)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 11/72

Page 15: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Typical proof assistant problem

• Does there exist a function f from R to R, such thatfor all x and y , f (x + y2)− f (x) ≥ y ?

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 12/72

Page 16: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Typical proof assistant problem

• Does there exist a function f from R to R, such thatfor all x and y , f (x + y2)− f (x) ≥ y ?

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 12/72

1. f (x + y2)− f (x) ≥ y for any given x and y

2. f (x + n · y2)− f (x) ≥ n · y for any x , y , and n ∈ N(easy induction using [1] for the step case)

3. f (1)− f (0) ≥ m + 1 for any m ∈ N(set n = (m + 1)2, x = 0, y = 1

m+1 in [2])

4. Contradiction with the Archimedean property of R

Page 17: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Formalization

let lemma =(‘∀f:real→real. ¬(∀x y. f(x + y * y) − f(x) ≥ y)‘,REWRITE_TAC[real_ge] THEN REPEAT STRIP_TAC THEN

SUBGOAL_THEN ‘∀n x y. &n * y ≤ f(x + &n * y * y) − f(x)‘ MP_TAC THENL

[MATCH_MP_TAC num_INDUCTION THEN SIMP_TAC[REAL_MUL_LZERO; REAL_ADD_RID] THEN

REWRITE_TAC[REAL_SUB_REFL; REAL_LE_REFL; GSYM REAL_OF_NUM_SUC] THEN

GEN_TAC THEN REPEAT(MATCH_MP_TAC MONO_FORALL THEN GEN_TAC) THEN

FIRST_X_ASSUM(MP_TAC o SPECL [‘x + &n * y * y‘; ‘y:real‘]) THEN

SIMP_TAC[REAL_ADD_ASSOC; REAL_ADD_RDISTRIB; REAL_MUL_LID] THEN

REAL_ARITH_TAC;

X_CHOOSE_TAC ‘m:num‘ (SPEC ‘f(&1) − f(&0):real‘ REAL_ARCH_SIMPLE) THEN

DISCH_THEN(MP_TAC o SPECL [‘SUC m EXP 2‘; ‘&0‘; ‘inv(&(SUC m))‘]) THEN

REWRITE_TAC[REAL_ADD_LID; GSYM REAL_OF_NUM_SUC; GSYM REAL_OF_NUM_POW] THEN

REWRITE_TAC[REAL_FIELD ‘(&m + &1) pow 2 * inv(&m + &1) = &m + &1‘;

REAL_FIELD ‘(&m + &1) pow 2 * inv(&m + &1) * inv(&m + &1) = &1‘] THEN

ASM_REAL_ARITH_TAC]);;

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 13/72

Page 18: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Formalization

let lemma =(‘∀f:real→real. ¬(∀x y. f(x + y * y) − f(x) ≥ y)‘,REWRITE_TAC[real_ge] THEN REPEAT STRIP_TAC THEN

SUBGOAL_THEN ‘∀n x y. &n * y ≤ f(x + &n * y * y) − f(x)‘ MP_TAC THENL

[MATCH_MP_TAC num_INDUCTION THEN SIMP_TAC[REAL_MUL_LZERO; REAL_ADD_RID] THEN

REWRITE_TAC[REAL_SUB_REFL; REAL_LE_REFL; GSYM REAL_OF_NUM_SUC] THEN

GEN_TAC THEN REPEAT(MATCH_MP_TAC MONO_FORALL THEN GEN_TAC) THEN

FIRST_X_ASSUM(MP_TAC o SPECL [‘x + &n * y * y‘; ‘y:real‘]) THEN

SIMP_TAC[REAL_ADD_ASSOC; REAL_ADD_RDISTRIB; REAL_MUL_LID] THEN

REAL_ARITH_TAC;

X_CHOOSE_TAC ‘m:num‘ (SPEC ‘f(&1) − f(&0):real‘ REAL_ARCH_SIMPLE) THEN

DISCH_THEN(MP_TAC o SPECL [‘SUC m EXP 2‘; ‘&0‘; ‘inv(&(SUC m))‘]) THEN

REWRITE_TAC[REAL_ADD_LID; GSYM REAL_OF_NUM_SUC; GSYM REAL_OF_NUM_POW] THEN

REWRITE_TAC[REAL_FIELD ‘(&m + &1) pow 2 * inv(&m + &1) = &m + &1‘;

REAL_FIELD ‘(&m + &1) pow 2 * inv(&m + &1) * inv(&m + &1) = &1‘] THEN

ASM_REAL_ARITH_TAC]);;

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 13/72

HOL(y)Hammer: general purposeproof assistant automation

• Machine Learning

• Automated Reasoning

Page 19: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Proof Assistant (2/2)

• Keep track of theories, definitions, assumptions• set up a theory that describes mathematical concepts

(or models a computer system)• express logical properties of the objects

• Interaction - proof editing• typically interactive• specified theory and proofs can be edited• provides information about required proof obligations• allows further refinement of the proof• often manually providing a direction in which to proceed.

• Automation - proof search• various strategies• decision procedures

• Proof checking• checking of complete proofs• sometimes providing certificates of correctness

• Why should we trust it?• small core

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 14/72

Page 20: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Can a Proof Assistant do all proofs?

Decidability!

• Validity of formulas is undecidable

• (for non-trivial logical systems)

Automated Theorem Provers• Specific domains

• Adjust your problem

• Answers: Valid (Theorem with proof)

• Or: Countersatisfiable (Possibly with counter-model)

Proof Assistants• Generally applicable

• Direct modelling of problems

• Interactive

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 15/72

Page 21: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Tool Categories

Computer Algebra

• Solving equations, simplifications, numerical approximations

• Maple, Mathematica, . . .

Model Checkers• Space state abstraction

• Spin, Uppaal, . . .

ATPs• Built in automation (model elimination, resolution)

• ACL2, Vampire, Eprover, SPASS, . . .

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 16/72

Page 22: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Proving Introduction

Theorems and programs that use ITP/ATP

Software and Hardware• Processors and Chips

• Security Protocols

• Project Cristal (Comp-Cert)

• L4-Verified

• Java Bytecode

Mathematical Theorems• Kepler Conjecture (2015)

• 4 color theorem

• Feit-Thomson

• Robbins conjecture, AIM

Wiedijk’s 100

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 17/72

Page 23: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Machine Learning Problems in Theorem Proving

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 18/72

Page 24: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Machine Learning Problems in Theorem Proving

Fast progress in machine learning

Tasks involving logical inference

• Natural language question answering [Sukhbaatar+2015 ]

• Knowledge base completion [Socher+2013 ]

• Automated translation [Wu+2016 ]

GamesAlphaGo problems similar to proving [Silver+2016 ]

• Node evaluation

• Policy decisions

Computer Vision

Better than human performance on some tasks [Russakovsky+2015 ]

A lot of AI, but little use of learning in proving

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 19/72

Page 25: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Machine Learning Problems in Theorem Proving

AI theorem proving techniques

High-level AI guidance

• tactic selection and premise (lemma) selection

• based on suitable features (characterizations) of the formulas

• and on learning lemma-relevance from many related proofs

Mid-level AI guidance

• learn good ATP strategies/tactics/heuristics for classes of problems

• learning lemma and concept re-use

• learn conjecturing

Low-level AI guidance

• guide (almost) every inference

step by previous knowledge

• good proof-state characterization and fast relevance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 20/72

Page 26: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Machine Learning Problems in Theorem Proving

AI theorem proving techniques

High-level AI guidance

• tactic selection and premise (lemma) selection

• based on suitable features (characterizations) of the formulas

• and on learning lemma-relevance from many related proofs

Mid-level AI guidance

• learn good ATP strategies/tactics/heuristics for classes of problems

• learning lemma and concept re-use

• learn conjecturing

Low-level AI guidance

• guide (almost) every inference step by previous knowledge

• good proof-state characterization and fast relevance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 20/72

Page 27: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Machine Learning Problems in Theorem Proving

Problems for Machine Learning

• Is a statement is useful?

• For a conjecture

• What are the dependencies of statement? (premise selection)

• Is a statement important? (named)

• What should the next proof step be?

• Tactic? Instantitation?

• How to name a statement?

• What new problem is likely to be true?

• Intermediate statement for a conjecture

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 21/72

Page 28: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 22/72

Page 29: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Premise selection

Intuition

Given:

• set of theorems T (together with proofs)

• conjecture c

Find:

• minimal subset of T that can be used to prove c

More formally

arg mint⊆T

{|t| | t ` c}

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 23/72

Page 30: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

In machine learning terminology

Multi-label classification

Input: set of samples S, where samples are triples s,F (s), L(s)

• s is the sample ID

• F (s) is the set of features of s

• L(s) is the set of labels of s

Output: function f that predicts n labels (sorted by relevance) for set offeatures

Sample add comm (a + b = b + a) could have:

• F(add comm) = {“+”, “=”, “num”}• L(add comm) = {num induct, add 0, add suc, add def}

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 24/72

Page 31: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Not exactly the usual machine learning problem

Labels correspond to premises and samples to theorems

• Very often same

Similar theorems are likely to be useful in the proof

• Also likely to have similar premises

Theorems sharing logical features are similar

• Theorems sharing rare features are very similar

Recently considered theorems and premises are important

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 25/72

Page 32: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Not exactly for the usual machine learning tools

Multi-label classifier output

• Often asked for 1000 or more most relevant lemmas

Efficient classifier update

• Learning time + prediction time small

• User will not wait more than 10–30 sec for all phases

Large numbers of features

• Complicated feature relations

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 26/72

Page 33: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Tried Premise Selection Techniques

• Syntactic methods• Neighbours using various metrics• Clustering• Recursive: SInE, MePo

• k-NN, Naive Bayes

• (space reductions)

• Regression• Kernel-based multi-output ranking

• Decision Trees (Random Forests)

• Neural Networks• Winnow, Perceptron (SNoW)• DeepMath

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 27/72

Page 34: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Tried Premise Selection Techniques

• Syntactic methods• Neighbours using various metrics• Clustering• Recursive: SInE, MePo

• k-NN, Naive Bayes

• (space reductions)

• Regression• Kernel-based multi-output ranking

• Decision Trees (Random Forests)

• Neural Networks• Winnow, Perceptron (SNoW)• DeepMath

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 27/72

Page 35: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Problem pruning with SInE

• SInE – SUMO Inference Engine [Hoder 2008 ]

• Original version implemented in Python and debuted in CASC-J4(with E and Vampire)

• Used by other ATPs: iProver-LTB, etc.• Efficient reimplementations

Vampire (2009/2010) and E (2010, GSinE)

• SInE selects relevant axioms from a large set• Axiom selection based on function/predicate symbols:

• The conjecture is relevant• Symbols in the conjecture are relevant• Formulas defining these symbols are relevant, and their other symbols

become relevant• Repeat until a fixed point or limit is reached

• A formula is interpreted as defining the globally rarest symbol(s)occurring in it

• Very Similar to MePo [MengPaulson2009 ]

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 28/72

Page 36: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

GSInE in E: e axfilter

• Parameterizable filters• Different generality measures

(frequency count, generosity, benevolence)• Different limits (absolute/relative size, # of iterations)• Different seeds (conjecture/hypotheses)

• Efficient implementation• E data types and libraries• Indexing (symbol → formula, formula → symbol)

• Multi-filter support• Parse & index once (amortize costs)• Apply different independent filters

• Primary use: Initial over-approximation(efficiently reduce HUGE input files to manageable size)

• Secondary use: Filtering for individual E strategies

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 29/72

Page 37: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

GSInE in E: e axfilter

• Parameterizable filters• Different generality measures

(frequency count, generosity, benevolence)• Different limits (absolute/relative size, # of iterations)• Different seeds (conjecture/hypotheses)

• Efficient implementation• E data types and libraries• Indexing (symbol → formula, formula → symbol)

• Multi-filter support• Parse & index once (amortize costs)• Apply different independent filters

• Primary use: Initial over-approximation(efficiently reduce HUGE input files to manageable size)

• Secondary use: Filtering for individual E strategies

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 29/72

Page 38: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Naive Bayes

P(f is relevant for proving g)

= P(f is relevant | g ’s features)

= P(f is relevant | f1, . . . , fn)

∝ P(f is relevant)Πni=1P(fi | f is relevant)

∝ #f is a proof dependency#all dependencies

· Πni=1

#fi appears when f is a proof dependency#f is a proof dependency

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 30/72

Page 39: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Naive Bayes: adaptation to premise selection

extended features F (φ) of a fact φ

features of φ and of the facts that were proved using φ(only one iteration)

More precise estimation of the relevance of φ to prove γ:

P(φ is used in ψ’s proof)

·∏

f∈F (γ)∩F (φ)P(ψ has feature f | φ is used in ψ’s proof

)·∏

f∈F (γ)−F (φ)P(ψ has feature f | φ is not used in ψ’s proof

)·∏

f∈F (φ)−F (γ)P(ψ does not have feature f | φ is used in ψ’s proof

)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 31/72

Page 40: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

All these probabilities can be computed efficiently

Update two functions (tables):

• t(φ): number of times a fact φ was dependency

• s(φ, f ):number of times a fact φ was dependency of a fact described byfeature f

Then:

P(φ is used in a proof of (any) ψ) =t(φ)

K

P(ψ has feature f | φ is used in ψ’s proof

)=

s(φ, f )

t(φ)

P(ψ does not have feature f | φ is used in ψ’s proof

)= 1− s(φ, f )

t(φ)

≈ 1− s(φ, f )− 1

t(φ)

Times for large dataset: secondsC. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 32/72

Page 41: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Decision Trees

Definition• each leaf stores a set of samples

• each branch stores a feature f and two subtrees, where:

• the left subtree contains only samples having f• the right subtree contains only samples not having f

Example

+

×

a× (b + c) =a× b + a× c

a + b =b + a

sin

sin x =− sin(−x)

×

a× b = b × a a = a

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 33/72

Page 42: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Random Forests

• Online and Offline variants [Safari09,Agrawal13 ]

• Online gives too much influence to the first learned nodes• Offline too slow to update

• Too few selected labels: Multi-path query

• Too slow to compute Gini-index for splitting feature selection

• With 50 trees and 1000 labels in each performance weak• Needs combining with other classifier: k-NN in the leave.

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 34/72

Page 43: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Random Forests

• Online and Offline variants [Safari09,Agrawal13 ]

• Online gives too much influence to the first learned nodes• Offline too slow to update

• Too few selected labels: Multi-path query

• Too slow to compute Gini-index for splitting feature selection

• With 50 trees and 1000 labels in each performance weak• Needs combining with other classifier: k-NN in the leave.

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 34/72

Page 44: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Random Forests: Results [Farber2015 ]

200 400 600 800 1,000

0.9

0.95

1

Evaluation samples

Pre

cisi

on

RFk-NN

Runtimes to achieve same number of proven theorems

Classifier Classifier runtime E timeout E runtime Total

k-NN 0.5min 15sec 314min 341minRF 22min 10sec 252min 272min

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 35/72

Page 45: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Deep Learning vs Shallow Learning

Hand crafted Features

Predictor

Data

Traditional machine learning

• Mostly convex, provably tractable

• Special purpose solvers

• Non-layered architectures

Learned Features

Predictor

Data

Deep Learning

• Mostly NP-Hard

• General purpose solvers

• Hierarchical models

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 36/72

Page 46: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Deep Learning vs Shallow Learning

Hand crafted Features

Predictor

Data

Traditional machine learning

• Mostly convex, provably tractable

• Special purpose solvers

• Non-layered architectures

Learned Features

Predictor

Data

Deep Learning

• Mostly NP-Hard

• General purpose solvers

• Hierarchical models

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 36/72

Page 47: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Deep Learning vs Shallow Learning

Hand crafted Features

Predictor

Data

Traditional machine learning

• Mostly convex, provably tractable

• Special purpose solvers

• Non-layered architectures

Learned Features

Predictor

Data

Deep Learning

• Mostly NP-Hard

• General purpose solvers

• Hierarchical models

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 36/72

Page 48: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

Deep Learning for Mizar Lemma Selection [Alemi+2016 ]

• Embed all lemmas into Rn using an LSTM

• Embed conjecture into Rn using an LSTM

• Simple classifier on top of concatenated embeddings

• Trained to estimate usefulness on positive and negative examples

Statement to be proved

Embedding network

Potential Premise

Embedding network

Combiner network

Classifier/Ranker

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 37/72

Page 49: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Premise Selection

DeepMath (2/2)

Cutoff k-NN Baseline (%) char-CNN (%) word-CNN (%) def-CNN-LSTM (%) def-CNN (%) def+char-CNN (%)16 674 (24.6) 687 (25.1) 709 (25.9) 644 (23.5) 734 (26.8) 835 (30.5)32 1081 (39.4) 1028 (37.5) 1063 (38.8) 924 (33.7) 1093 (39.9) 1218 (44.4)64 1399 (51) 1295 (47.2) 1355 (49.4) 1196 (43.6) 1381 (50.4) 1470 (53.6)

128 1612 (58.8) 1534 (55.9) 1552 (56.6) 1401 (51.1) 1617 (59) 1695 (61.8)256 1709 (62.3) 1656 (60.4) 1635 (59.6) 1519 (55.4) 1708 (62.3) 1780 (64.9)512 1762 (64.3) 1711 (62.4) 1712 (62.4) 1593 (58.1) 1780 (64.9) 1830 (66.7)

1024 1786 (65.1) 1762 (64.3) 1755 (64) 1647 (60.1) 1822 (66.4) 1862 (67.9)

Table 1: Results of ATP premise selection experiments with hard negative mining on a test set of 2,742 theorems.

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 38/72

Page 50: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 39/72

Page 51: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Intermediate lemmas [JSC 2015,FroCoS 2015 ]

Size of the inference graphs

Flyspeck graph

nodes edges

kernel inferences 1,728,861,441 1,953,406,411reduced trace 159,102,636 233,488,673

tactical inferences 11,824,052 42,296,208tactical trace 1,067,107 4,268,428

Same orders of magnitude for single ATP proofs

Repetitions → Re-use

• The graphs are already computed outsize of HOL

• Feature extraction, prediction, translation do not scale...

• Pre-select heuristically interesting lemmas

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 40/72

Page 52: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Intermediate lemmas [JSC 2015,FroCoS 2015 ]

Size of the inference graphs

Flyspeck graph

nodes edges

kernel inferences 1,728,861,441 1,953,406,411reduced trace 159,102,636 233,488,673

tactical inferences 11,824,052 42,296,208tactical trace 1,067,107 4,268,428

Same orders of magnitude for single ATP proofs

Repetitions → Re-use

• The graphs are already computed outsize of HOL

• Feature extraction, prediction, translation do not scale...

• Pre-select heuristically interesting lemmas

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 40/72

Page 53: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Heuristics for interesting lemmas (1/2)

Definition (Recursive dependencies and uses)

D(i) =

1 if i ∈ Named ∨ i ∈ Axioms,∑j∈d(i)

D(j) otherwise.

Definition (Lemma quality)

Q1(i) =U(i) ∗ D(i)

S(i)

Q2(i) =U(i) ∗ D(i)

S(i)2

Qr1(i) =

U(i)r ∗ D(i)2−r

S(i)

Q3(i) =U(i) ∗ D(i)

1.1S(i)

EpclLemma (longest chain), AGIntRater, ...

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 41/72

Page 54: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Heuristics for interesting lemmas (2/2)

PageRank

• Eigenvector centrality of a graph

• Fast, non-iterative, usable on whole Flyspeck

• Dominant eigenvector of:

PR1(i) =1− f

N+ f

∑i∈d(j)

PR1(j)

|d(j)|

• Size normalized

PR2(i) =PR1(i)

S(i)

Maximum Graph Cut

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 42/72

Page 55: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Heuristics for interesting lemmas (2/2)

PageRank

• Eigenvector centrality of a graph

• Fast, non-iterative, usable on whole Flyspeck

• Dominant eigenvector of:

PR1(i) =1− f

N+ f

∑i∈d(j)

PR1(j)

|d(j)|

• Size normalized

PR2(i) =PR1(i)

S(i)

Maximum Graph Cut

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 42/72

Page 56: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Learning Lemma Usefulness [ICLR 2017 ]

HOLStep Dataset

• Intermediate steps of the Kepler proof

• Only relevant proofs of reasonable size

• Annotate steps as useful and unused• Same number of positive and negative

• Tokenization and normalization of statements

Statistics

Train Test Positive Negative

Examples 2013046 196030 1104538 1104538Avg. length 503.18 440.20 535.52 459.66Avg. tokens 87.01 80.62 95.48 77.40Conjectures 9999 1411 - -Avg. deps 29.58 22.82 - -

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 43/72

Page 57: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Learning Lemma Usefulness [ICLR 2017 ]

HOLStep Dataset

• Intermediate steps of the Kepler proof

• Only relevant proofs of reasonable size

• Annotate steps as useful and unused• Same number of positive and negative

• Tokenization and normalization of statements

Statistics

Train Test Positive Negative

Examples 2013046 196030 1104538 1104538Avg. length 503.18 440.20 535.52 459.66Avg. tokens 87.01 80.62 95.48 77.40Conjectures 9999 1411 - -Avg. deps 29.58 22.82 - -

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 43/72

Page 58: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Considered Models

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 44/72

Page 59: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Useful Intermediate Steps

Baselines (Training Profiles)

char-level token-level

un

con

dit

ion

edco

ject

ure

con

dit

ion

ed

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 45/72

Page 60: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 46/72

Page 61: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Example Theorem Names [Aspinall2016]

• Names attached to noteworthy theoremsRIEMANN MAPPING THEOREM:∀s.open s ∧ simply connected s ⇐⇒ s = {} ∨ s = (:real2) ∨∃f g. f holomorphic on s ∧ g holomorphic on ball(0, 1) ∧(∀z. z IN s =⇒ f z IN ball(Cx(&0), &1) ∧ g(f z) = z) ∧(∀z. z IN ball(Cx(&0),&1) =⇒ g z IN s ∧ f(g z) = z)

• Descriptive namesADD ASSOC : ∀m n p. m+(n+p)=(m+n)+p

REAL MIN ASSOC : ∀x y z. min x (min y z)=min (min x y) z

SUC GT ZERO : ∀x. Suc x> 0

• Automatic names:HOJODCM LEBHIRJ OBDATYB MEEIXJO KBWPBHQ RYIUUVKDIOWAAS PNXVWFS PBFLHET QCDVKEA NCVIBWU PPLHULJ

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 47/72

Page 62: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Example Theorem Names [Aspinall2016]

• Names attached to noteworthy theoremsRIEMANN MAPPING THEOREM:∀s.open s ∧ simply connected s ⇐⇒ s = {} ∨ s = (:real2) ∨∃f g. f holomorphic on s ∧ g holomorphic on ball(0, 1) ∧(∀z. z IN s =⇒ f z IN ball(Cx(&0), &1) ∧ g(f z) = z) ∧(∀z. z IN ball(Cx(&0),&1) =⇒ g z IN s ∧ f(g z) = z)

• Descriptive namesADD ASSOC : ∀m n p. m+(n+p)=(m+n)+p

REAL MIN ASSOC : ∀x y z. min x (min y z)=min (min x y) z

SUC GT ZERO : ∀x. Suc x> 0

• Automatic names:HOJODCM LEBHIRJ OBDATYB MEEIXJO KBWPBHQ RYIUUVKDIOWAAS PNXVWFS PBFLHET QCDVKEA NCVIBWU PPLHULJ

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 47/72

Page 63: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Example Theorem Names [Aspinall2016]

• Names attached to noteworthy theoremsRIEMANN MAPPING THEOREM:∀s.open s ∧ simply connected s ⇐⇒ s = {} ∨ s = (:real2) ∨∃f g. f holomorphic on s ∧ g holomorphic on ball(0, 1) ∧(∀z. z IN s =⇒ f z IN ball(Cx(&0), &1) ∧ g(f z) = z) ∧(∀z. z IN ball(Cx(&0),&1) =⇒ g z IN s ∧ f(g z) = z)

• Descriptive namesADD ASSOC : ∀m n p. m+(n+p)=(m+n)+p

REAL MIN ASSOC : ∀x y z. min x (min y z)=min (min x y) z

SUC GT ZERO : ∀x. Suc x> 0

• Automatic names:HOJODCM LEBHIRJ OBDATYB MEEIXJO KBWPBHQ RYIUUVKDIOWAAS PNXVWFS PBFLHET QCDVKEA NCVIBWU PPLHULJ

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 47/72

Page 64: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Associations between names and statements

Consistent

• association between symbols and parts of theorem names

Abstract

• abstracting from concrete symbols

• association between patterns and name parts

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 48/72

Page 65: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Modified k-Nearest Neighbours multi-label classifier

The nearness of two statements s1, s2:

n(s1, s2) =

√∑f∈f (s1)∩f (s2)

w(f )2

• w(f ) is the IDF (inverse document frequency) of feature f

Training examples are indexed by featuresto efficiently evaluate the relevance of a label:

R(l) =∑

s1∈N,l∈l(s1)

n(s1, s2)∣∣l(s1)∣∣

Positions for a stem proposed using the weighted average of thepositions in the recommendations.

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 49/72

Page 66: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Selected features of ADD ASSOC

Feature Frequency Position IDF

(V0 + (V1 + V2) = (V0 + V1) + V2) 1 0.37 7.82((V0 + V1) + V2) 1 0.75 7.13

(V0 + V1) 1 0.84 3.95+ 4 0.72 2.62

num 3 0.21 1.15= 1 0.43 0.23∀ 3 0.15 0.03

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 50/72

Page 67: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Nearest Neighbours of ADD ASSOC

Theorem name Statement Nearness

MULT ASSOC (V0 * (V1 * V2)) = ((V0 * V1) * V2) 553ADD AC 1 ((V0 + V1) + V2) = (V0 + (V1 + V2)) 264EXP MULT (EXP V0 (V1 * V2)) = (EXP (EXP V0 V1) V2) 247

HREAL ADD ASSOC (V0 +H (V1 +H V2)) = ((V0 +H V1) +H V2) 246HREAL MUL ASSOC (V0 *H (V1 *H V2)) = ((V0 *H V1) *H V2) 246REAL ADD ASSOC (V0 +R (V1 +R V2)) = ((V0 +R V1) +R V2) 246REAL MUL ASSOC (V0 *R (V1 *R V2)) = ((V0 *R V1) *R V2) 246REAL MAX ASSOC (MAXR V0 (MAXR V1 V2)) = (MAXR (MAXR V0 V1) V2) 246INT ADD ASSOC (V0 +Z (V1 +Z V2)) = ((V0 +Z V1) +Z V2) 246

REAL MIN ASSOC (MINR V0 (MINR V1 V2)) = (MINR (MINR V0 V1) V2) 246

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 51/72

Page 68: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Predicted Stems for associativity of addition

Stem Positions

ADD [0.18; 0.14; 0.12; 0; 0.12]ASSOC [1]

AC [0; 1]NUM [0.67; 0.45; 0; 0.70]

FIXED [0]EQUAL [0.25; 0.25]

ONE [0]

Stem Positions

MONO [0.44; 0.49]REFL [1]

SELECT [0]SPLITS [0]

RCANCEL [1]IITN [0]GEN [1]

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 52/72

Page 69: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Theorem Names

Learning and Evaluation

• Use k-NN to predict stems and their positions• Based on constants and shapes of theorems (ADD ASSOC)

• Leave-one-out cross-validation

• 2298 statements in HOL Light standard library

First Later Same All StemChoice Choice Stems Stems Overlap Fail

273 501 291 757 459 17

Failures:

• EXCLUDED MIDDLE (proposed: DISJ THM)

• INFINITY AX (proposed: ONTO ONE)

• ETA AX, WLOG RELATION TOPOLOGICAL SORT

Inconsistencies

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 53/72

Page 70: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Outline

Theorem Proving Introduction

Machine Learning Problems in Theorem Proving

Premise Selection

Useful Intermediate Steps

Theorem Names

Internal Guidance

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 54/72

Page 71: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

leanCoP: Lean Connection Prover (Jens Otten)

• Connected tableaux calculus• Goal oriented, good for large theories

• Regularly beats Metis and Prover9 in CASC• despite their much larger implementation• very good performance on some ITP challenges

• Compact Prolog implementation, easy to modify• Variants for other foundations: iLeanCoP, mLeanCoP• First experiments with machine learning: MaLeCoP

• Easy to imitate• leanCoP tactic in HOL Light

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 55/72

Page 72: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Lean connection Tableaux

Very simple rules:

• Reduction unifies the current literal with a literal on the path

• Extension unifies the current literal with a copy of a clause

{}, M, PathAxiom

C , M, Path ∪ {L2}C ∪ {L1}, M, Path ∪ {L2}

Reduction

C2 \ {L2}, M, Path ∪ {L1} C , M, Path

C ∪ {L1}, M, PathExtension

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 56/72

Page 73: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

leanCoP: Basic Unoptimized Code

1 prove([Lit|Cla],Path ,PathLim ,Lem ,Set) :-

23 (-NegLit=Lit;-Lit=NegLit) ->

4 (

567 member(NegL ,Path),unify_with_occurs_check(NegL ,NegLit)

8 ;

9 lit(NegLit ,NegL ,Cla1 ,Grnd1),

10 unify_with_occurs_check(NegL ,NegLit),

11121314 prove(Cla1 ,[Lit|Path],PathLim ,Lem ,Set)

15 ),

1617 prove(Cla ,Path ,PathLim ,Lem ,Set).

18 prove([],_,_,_,_).

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 57/72

Page 74: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

leanCoP: Actual Optimized Code

1 prove([Lit|Cla],Path ,PathLim ,Lem ,Set) :-

2 \+ (member(LitC ,[Lit|Cla]),member(LitP ,Path),LitC==LitP),

3 (-NegLit=Lit;-Lit=NegLit) ->

4 (

5 member(LitL ,Lem), Lit==LitL

6 ;

7 member(NegL ,Path),unify_with_occurs_check(NegL ,NegLit)

8 ;

9 lit(NegLit ,NegL ,Cla1 ,Grnd1),

10 unify_with_occurs_check(NegL ,NegLit),

11 ( Grnd1=g -> true ;

12 length(Path ,K), K<PathLim -> true ;

13 \+ pathlim -> assert(pathlim), fail ),

14 prove(Cla1 ,[Lit|Path],PathLim ,Lem ,Set)

15 ),

16 ( member(cut ,Set) -> ! ; true ),

17 prove(Cla ,Path ,PathLim ,[Lit|Lem],Set).

18 prove([],_,_,_,_).

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 58/72

Page 75: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

leanCoP in HOL Light

• OCaml leanCoP integrated as an automated proof tactic• Reconstruction technique for hammers

• Some Prolog technology missing in functional languages• explicit stack for current proof state

• Including the trail of variable bindings

• continuation passing style for flow control• backtracking alternatives• further tasks

• Improvement compared to existing reconstruction• Very secure HOL Light certification of leanCoP proofs

Prover Theorem (%)

OcaML-leanCoP (cut) 759 (87.04)Metis (2.3) 708 (81.19)Meson 683 (78.32)

any 832 (95.41)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 59/72

Page 76: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

FEMaLeCoP: Advice Overview and Used Features

• Advise the:• selection of clause for every tableau extension step

• Proof state: weighted vector of symbols (or terms)• extracted from all the literals on the active path• Frequency-based weighting (IDF)• Simple decay factor (using maximum)

• Consistent clausification• formula ?[X]: p(X) becomes p(’skolem(?[A]:p(A),1)’)

• Advice using custom sparse naive Bayes• association of the features of the proof states• with contrapositives used for the successful extension steps

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 60/72

Page 77: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

FEMaLeCoP: Data Collection and Indexing

• Slight extension of the saved proofs• Training Data: pairs (path, used extension step)

• External Data Indexing (incremental)• te num: number of training examples• pf no: map from features to number of occurrences ∈ Q• cn no: map from contrapositives to numbers of occurrences• cn pf no: map of maps of cn/pf co-occurrences

• Problem Specific Data• Upon start FEMaLeCoP reads

• only current-problem relevant parts of the training data

• cn no and cn pf no filtered by contrapositives in the problem• pf no and cn pf no filtered by possible features in the problem

Rest of Naive Bayes, as in previous lecture

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 61/72

Page 78: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

It cannot work?

But it does!

Inference speed ...

drops to about 40%, but:

Prover Proved (%)

Unguided OCaml-leanCoP 574 (27.6%)FEMaLeCoP 635 (30.6%)

together 664 (32.0%)

(evaluation on MPTP bushy problems, 60 s)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 62/72

Page 79: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

It cannot work? But it does!

Inference speed ... drops to about 40%, but:

Prover Proved (%)

Unguided OCaml-leanCoP 574 (27.6%)FEMaLeCoP 635 (30.6%)

together 664 (32.0%)

(evaluation on MPTP bushy problems, 60 s)

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 62/72

Page 80: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Even more impressive in HOL (Satallax) [BrownFarber2016 ]

0 5 10 15 20 25 30

0

1,000

2,000

3,000

4,000

Time [s]

Pro

ble

ms

solv

ed

OfflineTraining

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 63/72

Page 81: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

E-Prover given-clause loop

Most important choice: unprocessed clause selection [Schulz 2015 ]

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 64/72

Page 82: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Data Collection

Mizar top-level theorems [Urban 2006 ]

• Encoded in FOF

32,521 Mizar theorems with ≥ 1 proof

• training-validation split (90%-10%)

• replay with one strategy

Collect all CNF intermediate steps

• and unprocessed clauses when proof is found

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 65/72

Page 83: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Deep Network Architectures

Clause EmbedderNegated conjecture

embedder

Concatenate

Fully Connected(1024 nodes)

Fully Connected(1 node)

Logistic loss

Clause tokens Negated conjecture tokens

Conv 5 (1024) + ReLU

Input token embeddings

Conv 5 (1024) + ReLU

Conv 5 (1024) + ReLU

Max Pooling

Overall network Convolutional Embedding

Non-dilated and dilated convolutions

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 66/72

Page 84: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Recursive Neural Networks

• Curried representation of first-order statements

• Separate nodes for apply, or, and, not

• Layer weights learned jointly for the same formula

• Embeddings of symbols learned with rest of network

• Tree-RNN and Tree-LSTM models

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 67/72

Page 85: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Model accuracy

Model Embedding Size Accuracy: 50-50% splitTree-RNN-256×2 256 77.5%Tree-RNN-512×1 256 78.1%

Tree-LSTM-256×2 256 77.0%Tree-LSTM-256×3 256 77.0%Tree-LSTM-512×2 256 77.9%

CNN-1024×3 256 80.3%?CNN-1024×3 256 78.7%

CNN-1024×3 512 79.7%CNN-1024×3 1024 79.8%

WaveNet-256×3×7 256 79.9%?WaveNet-256×3×7 256 79.9%WaveNet-1024×3×7 1024 81.0%

WaveNet-640×3×7(20%) 640 81.5%?WaveNet-640×3×7(20%) 640 79.9%

? = train on unprocessed clauses as negative examples

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 68/72

Page 86: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Hybrid Heuristic

Already on proved statementsperformance requires modifications

102 103 104 105

Processed clause limit

0%

20%

40%

60%

80%

100%

Perc

ent

unpro

ved

Pure CNN

Hybrid CNN

Pure CNN; Auto

Hyrbid CNN; Auto

102 103 104 105

Processed clause limit

0%

20%

40%

60%

80%

100%

Perc

ent

unpro

ved

Auto

WaveNet 640*

WaveNet 256

WaveNet 256*

WaveNet 640

CNN

CNN*

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 69/72

Page 87: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Harder Mizar top-level statements

Model DeepMath 1 DeepMath 2 Union of 1 and 2

Auto 578 581 674?WaveNet 640 644 612 767?WaveNet 256 692 712 864

WaveNet 640 629 685 997?CNN 905 812 1,057

CNN 839 935 1,101

Total (unique) 1,451 1,458 1,712

Overall proved 7.4% of the harder statements

• Batching and hybrid necessary

• Model accuracy unsatisfactory

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 70/72

Page 88: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Harder Mizar top-level statements

Model DeepMath 1 DeepMath 2 Union of 1 and 2

Auto 578 581 674?WaveNet 640 644 612 767?WaveNet 256 692 712 864

WaveNet 640 629 685 997?CNN 905 812 1,057

CNN 839 935 1,101

Total (unique) 1,451 1,458 1,712

Overall proved 7.4% of the harder statements

• Batching and hybrid necessary

• Model accuracy unsatisfactory

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 70/72

Page 89: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Generative Models of Text

[A. Karpathy 2016]C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 71/72

Page 90: Learning Problems in Theorem Proving - FORSYTEforsyte.at/wp-content/uploads/laive-slides-cezary.pdf · Learning Problems in Theorem Proving Cezary Kaliszyk Universit at Innsbruck

Internal Guidance

Take Home• Proofs are hard

• Machine learning key to most powerful proof assistant automation

• Older but very efficient algorithms with significant adjustments

• Many other learning problems and scenarios

Not covered• Learning strategy selection [Jakubuv,Urban]

• Kernel methods [Kuhlwein]

• SVM [Holden]

• Deep Prolog [Rocktaschel ]

• Semantic Features, Conecturing

• Tactic selection [Nagashima,Gauthier ]

• Adversarial Networks [Szegedy ]

• Human proof optimization

• Theory exploration [Bundy+]

• Learning to formalize [Vyskocil ]

• ...

C. Kaliszyk (Universitat Innsbruck) Learning Problems in Theorem Proving 72/72