UniversityofBarcelona arXiv:1602.00555v1 [math.LO] 1 … A short word on coding ... Since...

31
arXiv:1602.00555v1 [math.LO] 1 Feb 2016 Characterizations of interpretability in bounded arithmetic Joost J. Joosten University of Barcelona May 17, 2018 Abstract This paper deals with three tools to compare proof-theoretic strength of formal arithmetical theories: interpretability, Π 0 1 -conservativity and proving restricted consis- tency. It is well known that under certain conditions these three notions are equivalent and this equivalence is often referred to as the Orey-H´ajek characterization of inter- pretability. In this paper we look with detail at the Orey-H´ajek characterization and study what conditions are needed and in what meta-theory the characterizations can be formalized. 1 Introduction Interpretations are everywhere used in mathematics and mathematical logic. Basically, a theory U interprets a theory V –we write U V – whenever there is some translation from the symbols of the language of V to formulas of the language of U so that under a natural extension of this translation the axioms of V are mapped to theorems of U . The corresponding intuition should be that U is at least as strong or expressible as V . And indeed, interpretations are used for example to give relative consistency proofs or to establish undecidability of theories. As such, interpretations are considered an im- portant metamathematical notion. Probably, the first time that interpretations received a formal and systematic treatment has been in the book by A. Tarski, A. Mostowski and R. Robinson ([17]). In the current paper we will study that notion of interpretability and also some related notions. Sometimes we speak of relative interpretability as to indicate that quantifications become relativized to some domain specifier as we shall define precisely later on. We will relate the notion of relative interpretability to two other basic metamathemat- ical notions. The first such notion is the notion of consistency. The notion of consistency is central to mathematical logic and considered key and fundamental. A second notion is that of Π 0 1 conservativity. Below we will exactly define what Π 0 1 formulas are, but basically, those are formulas in the language of arithmetic which are of the form (x) where ψ is some decidable predicate. On the other hand, Σ 0 1 formulas 1

Transcript of UniversityofBarcelona arXiv:1602.00555v1 [math.LO] 1 … A short word on coding ... Since...

arX

iv:1

602.

0055

5v1

[m

ath.

LO

] 1

Feb

201

6

Characterizations of interpretability in bounded arithmetic

Joost J. JoostenUniversity of Barcelona

May 17, 2018

Abstract

This paper deals with three tools to compare proof-theoretic strength of formalarithmetical theories: interpretability, Π0

1-conservativity and proving restricted consis-

tency. It is well known that under certain conditions these three notions are equivalentand this equivalence is often referred to as the Orey-Hajek characterization of inter-pretability.

In this paper we look with detail at the Orey-Hajek characterization and study whatconditions are needed and in what meta-theory the characterizations can be formalized.

1 Introduction

Interpretations are everywhere used in mathematics and mathematical logic. Basically, atheory U interprets a theory V –we write U V – whenever there is some translation fromthe symbols of the language of V to formulas of the language of U so that under a naturalextension of this translation the axioms of V are mapped to theorems of U .

The corresponding intuition should be that U is at least as strong or expressible asV . And indeed, interpretations are used for example to give relative consistency proofsor to establish undecidability of theories. As such, interpretations are considered an im-portant metamathematical notion. Probably, the first time that interpretations receiveda formal and systematic treatment has been in the book by A. Tarski, A. Mostowski andR. Robinson ([17]). In the current paper we will study that notion of interpretability andalso some related notions. Sometimes we speak of relative interpretability as to indicatethat quantifications become relativized to some domain specifier as we shall define preciselylater on.

We will relate the notion of relative interpretability to two other basic metamathemat-ical notions. The first such notion is the notion of consistency. The notion of consistencyis central to mathematical logic and considered key and fundamental.

A second notion is that of Π01 conservativity. Below we will exactly define what Π0

1

formulas are, but basically, those are formulas in the language of arithmetic which are ofthe form ∀xψ(x) where ψ is some decidable predicate. On the other hand, Σ0

1 formulas

1

are those of the form ∃xψ(x) for decidable ψ. Since all true theories prove exactly thesame set of Σ0

1 sentences, the first natural and interesting class of formulas to distinguishtheories is on the Π0

1 level. Therefore, the notion of Π01 conservativity has been very

central in mathematical logic and foundational discussions. We say that a theory U is Π01

conservative over V whenever any Π01 sentence provable by V is also provable by U .

The main purpose of this paper is to discuss how these three different notions arerelated to each other in certain circumstances. This relation is known as the Orey-Hajekcharacterization of relative interpretability.

As such, the paper contains many well-known results and various formulations are takenfrom [8]. However, we think that it is instructive that all these results are put togetherand moreover that a clear focus is on the requirements needed so that various implicationsare formalizable in weak theories.

Apart from the main focus –which is bringing together facts of the Orey-Hajek char-acterization of relative interpretability and formalizations thereof– the paper contains acollection of new observations that might come in handy. For example, our simple gener-alization of Pudlak’s lemma as formulated in Lemma 5.5 has been a main tool in provingarithmetical correctness of a new series of interpretability principles in [9].

2 Preliminaries

As mentioned before, a central notion in this paper is that of consistency. Consistency is anotion that concerns syntax: no sequence of symbols that constitute a proof will yield theconclusion that 0 = 1. It shall be an important criterion whether or not a theory provesthe consistency of another. As such we want that theories can talk about syntax.

The standard choice to represent syntax is by Godel numbering, assigning natural num-bers to syntax. Thus, our theories should contain a modicum of arithmetic. In this sectionwe shall make some basic observations on coding and then fix what minimal arithmetic weshould have in our base theory. We shall formulate some fundamental properties of thisbase theory and refer to the literature for further background. Further, we shall fix thenotation that is used in the remainder of this paper.

2.1 A short word on coding

Formalization calls for coding of syntax. At some places in this paper we shall needestimates of codes of syntactical objects. Therefore it is good to discuss the nature of thecoding process we will employ. However we shall not consider the implementation detailsof our coding.

We shall code strings over some finite alphabet A with cardinality a. A typical codingprotocol could be the following. First we define an alphabetic order on A. Next weenumerate all finite strings over A in the following way (pseudo-lexicographic order).

2

To start, we enumerate all strings of length 0, then of length 1, etcetera. For everyn, we enumerate the strings of length n in alphabetic order. The coding of a finite stringover A will just be its ordinal number in this enumeration. We shall now see some easyarithmetical properties of this coding. We shall often refrain from distinguishing syntacticalobjects and their codes.

1. There are an many strings of length n.

2. There are an + an−1 · · · + 1 = an+1−1a−1 many strings of length ≤ n.

3. From (2) it follows that the code of a syntactical object of length n, is O(an+1−1a−1 ) =

O(an) big.

4. Conversely, the length of a syntactical object that has code ϕ isO(|ϕ|) (logarithm/lengthof ϕ) big.

5. If ϕ and ψ are codes of syntactical objects, the concatenation ϕ ⋆ ψ of ϕ and ψ isO(ϕ·ψ) big. For, |ϕ ⋆ ψ| = |ϕ|+|ψ|, whence by (3), ϕ ⋆ ψ ≈ a|ϕ|+|ψ| = a|ϕ|·a|ψ| = ϕ·ψ.

6. If ϕ and t are (codes of) syntactical objects, then ϕx(t) is O(ϕ|t|) big. Here ϕx(t)denotes the syntactical object that results from ϕ by replacing every (unbounded)occurrence of x by t. The length of ϕ is about |ϕ|. In the worst case, these are allx-symbols. In this case, the length of ϕx(t) is |ϕ| · |t| and thus ϕx(t) is O(a

|ϕ|·|t|) =O(t|ϕ|) = O(ϕ|t|) = O(2|ϕ|·|t|) big.

As mentioned, we shall refrain from the technical characteristics of our coding and referto the literature for examples. Rather, we shall keep in mind restrictions on the sizes andbounds as mentioned above. Also, we shall assume that we work with a natural poly-timecoding with poly-time decoding functions so that the code of substrings is always smallerthan the code of the entire string.

2.2 Arithmetical theories

Since substitution is key to manipulating syntax we need, by our observations above, afunction whose growth-rate can capture substitution. Thus, we choose to work with thesmash function ♯ defined by x♯y := 2|x|·|y| where |x| := ⌈log2(x + 1)⌉ is the length ofthe number x in binary. We shall often also employ the function ω1 which is of similargrowth-rate and defined by ω1(x) := 2|x|

2.

Next, we need a certain amount of induction. For a formula ϕ, the regular inductionformula Iϕ is given by

ϕ(0) ∧ ∀x (ϕ(x)→ ϕ(x+ 1)) → ∀x ϕ(x).

3

However, it turns out that we can work with a weaker version of induction called polynomialinduction denoted by PIND:

ϕ(0) ∧ ∀x (ϕ(⌊1

2x⌋)→ ϕ(x)) → ∀x ϕ(x)

or equivalently

ϕ(0) ∧ ∀x (ϕ(x)→ ϕ(2x)) ∧ ∀x (ϕ(x)→ ϕ(2x+1)) → ∀x ϕ(x).

The idea is that one can conclude ϕ(x) by only logarithmically many calls upon the in-duction hypothesis with this PIND principle. For example to conclude ϕ(18) we’ld goϕ(0)→ ϕ(1)→ ϕ(2)→ ϕ(4)→ ϕ(9)→ ϕ(18).

Typically, induction on syntax is of this nature and in order to conclude a property of(the Godel number of) some formula ψ we need to apply the induction hypothesis to thenumber of subformulas of ψ which is linear in the length of ψ. Thus, most inductions oversyntax can be established by PIND rather than the regular induction schema.

Moreover, we shall restrict the formulas on which we allow ourselves to apply PINDto so to end up with a weak base theory. As we shall see, most of our arguments can beformalized within Buss’ theory1 S1

2.

The theory S12is formulated in the language of arithmetic 0, S,+, ·, ♯, |x|, ⌊12x⌋,≤.

Apart from some basic axioms that define the symbols in the language, S12is axiomatized

by PIND induction for Σb1 formulas. The Σb1 formulas are those formed from atomicformulas via the boolean operators, sharply bounded quantification and bounded existentialquantification. Sharply bounded quantification is quantification of the from Q x<|t| forQ ∈ ∀,∃. Bounded existential quantification in contrast, is of the form ∃x<t. We referthe reader for [1] or [6] for further details and for the definitions of the related Σbn and Πbnhierarchies.

Equivalent to the PIND principle (see [11, Lemma 5.2.5]) is the length induction prin-ciple LIND:

ϕ(0) ∧ ∀x (ϕ(x)→ ϕ(x+1)) → ∀x ϕ(|x|).

So, from the progressiveness of ϕ, we can conclude ϕ(x) for any x for which the exponentia-tion is defined. We shall later see that if we are working with definable cuts (definable initialsegments of the natural numbers with some natural closure properties) we can without lossof generality assume that exponentiation is defined for elements of this cut.

Although most of our reasoning can be performed in S12, we sometimes mention stronger

theories. As always Peano Arithmetic (PA) contains open axioms that define the symbols0, S,+ and · and induction axioms Iϕ for any arithmetical formula ϕ. Similarly, IΣn is asPA where instead we only have induction axioms Iϕ for ϕ ∈ Σn. Here, Σn refers to the

1As mentioned, the substitution operation on codes of syntactical objects asks for a function of growthrate x|x|. In Buss’s S

12 this is the smash function ♯. In the theory I∆0 + Ω1 this is the function ω1(x).

However, contrary to S1

2, the theory I∆0 + Ω1 –aka S2– is not known to be finitely axiomatizable.

4

usual arithmetical hierarchy (see e.g. [6]) in that such formulas are written as a decidableformula preceded by a string of n alternating quantifiers with an existential quantifier upfront. In case no free variables are allowed in the induction formulas, we flag this by asuperscript “−” as in IΣ−

n .Another important arithmetical principle that we will encounter frequently is collection.

For example BΣn is the so-called collection scheme for Σn-formulae. Roughly, BΣn saysthat the range of a Σn-definable function on a finite interval is again finite. A mathematicalformulation is ∀x≤u∃y σ(x, y)→ ∃z ∀x≤u∃ y≤z σ(x, y) where σ(x, y) ∈ Σn may containother variables too.

The least number principle LΓ for a class of formulas is the collection ∃x ϕ(x) →∃x (ϕ(x) ∧ ∀ y<x ¬ϕ(y)) for ϕ ∈ Γ.

2.3 Numberized theories

The notion of interpretability applies to any pair of theories and not necessarily needthey contain any arithmetic. However, in this paper we will prove that U V can invarious occasions be equivalent to other properties that are stated in terms of numbers.For example, in certain situations we have that U V is equivalent to U proving all theΠ0

1 formulas that V does. Clearly, in this situation we should understand that U and Vcome with a natural interpretation of numbers.

Definition 2.1. We will call a pair 〈U, k〉 a numberized theory if k : U S12. A theory Uis numberizable or arithmetical if for some j, 〈U, j〉 is a numberized theory.

From now on, we shall only consider numberizable or numberized theories. Oftenhowever, we will fix a numberization j and reason about the theory 〈U, j〉 as if it wereformulated in the language of arithmetic.

A disadvantage of doing so is clearly that our statements may be somehow misleading;when we think of, e.g., ZFC we do not like to think of it as coming with a fixed numberiza-tion. However, for the kind of characterizations treated in this paper, it is really needed tohave numbers around. We shall most of the times work with sequential theories. Basically,sequentiality means that any finite sequence of objects can be coded.

2.4 Metamathematics in numberized theories

On many occasions, we want to represent numbers by terms (numerals) and then considerthe code of that term. It is not a good idea to represent a number n by

n times︷ ︸︸ ︷

S . . . S 0.

For, the length of this object is n + 1 whence its code is about 2n+1 and we would like toavoid the use of exponentiation. In the setting of weaker arithmetics it is common practice

5

to use so-called efficient numerals. These numerals are defined by recursion as follows.0 = 0; 2·n = (SS0) · n and 2·n+ 1 = S((SS0) · n). Clearly, these numerals implement thesystem of dyadic notation which perfectly ties up with the PIND principle. Often we shallrefrain between distinguishing n from its numeral n or even the Godel number pnq of itsnumeral.

As we want to do arithmetization of syntax, our theories should be coded in a simpleway. We will assume that all our theories U have an axiom set that is decidable in poly-nomial time. That is, there is some formula AxiomU (x) which is ∆b

1 (both the formula andits negation are provably equivalent to a Σb1 formula) in S1

2, with

S12⊢ AxiomU (ϕ) iff ϕ is an axiom of U .

The choice of ∆b1-axiomatizations is also motivated by Lemma 2.2 below. Most natural the-

ories like ZFC or PA indeed have ∆b1-axiomatizations. Moreover, by a sharpening of Craig’s

trick, any recursive theory is deductively equivalent to one with a ∆b1-axiomatization.

We shall employ the standard techniques and concepts necessary for the arithmetizationof syntax. Thus, we shall work with provability predicates U corresponding uniformly toarithmetical theories U . We shall adhere to the standard dot notation so that, for example,Uϕ(x) denotes a formula with one free variable x so that for each value of x, Uϕ(x) isprovably equivalent to Uϕ(x).

We shall always write the formalized version of a concept in sans-serif style. For exam-ple, ProofU (p, ϕ) stands for the formalization of “p is a U -proof of ϕ”, Con(U) stands forthe formalization of “U is a consistent theory” and so forth. It is known that for theoriesU with a poly-time axiom set, the formula ProofU (p, ϕ) can be taken to be in ∆b

1 being apoly-time decidable predicate. Again, [1] and [6] are adequate references.

For already really weak theories T we have Σ1-completeness in the sense that T provesany true Σ1 sentence. However, proofs of Σ1-sentences σ are multi-exponentially big, thatis, 2σn for some n depending on σ. (See e.g., [6].) As such, we cannot expect that we canformalize the Σ1 completeness theorem in theories where exponentiation is not necessarilytotal.

However, for ∃Σb1-formulas we do have a completeness theorem (see [1]) in boundedarithmetic. From now on, we shall often write a sup-index to a quantifier to specify thedomain of quantification.

Lemma 2.2. If α(x) ∈ ∃Σb1, then there is some standard natural number n such that

S12 ⊢ ∀x [α(x)→ ∃ p<ωn1 (x) ProofU (p, α(x))].

This holds for any reasonable arithmetical theory U . Moreover, we have also a formalizedversion of this statement.

S12 ⊢ ∀∃Σb

1α ∃n S12

(∀x [α(x)→ ∃ p<ωn1 (x) ProofU (p, α(x))]).

6

2.5 Consistency and reflexive theories

Since Godel’s second incompleteness theorem, we know that no recursive theory that isconsistent can prove its own consistency. For a large class of natural theories we do havea good approximation of proving consistency though. A theory is reflexive if it provesthe consistency of all of its finite subtheories. Reflexivity is a natural notion and mostnatural non-finitely axiomatized theories are reflexive like, for example, primitive recursivearithmetic and PA.

Many meta-mathematical statements involve the notion of reflexivity. There exist var-ious ways in which reflexivity can be formalized, and throughout the literature we can findmany different formalizations. For stronger theories, all these formalizations coincide. Butfor weaker theories, the differences are essential. We give some formalizations of reflexivity.

1. ∀n U ⊢ Con(U [n]) where U [n] denotes the conjunction of the first n axioms of U .

2. ∀n U ⊢ Con(Un) where Con(Un) denotes that there is no proof of falsity using onlyaxioms of U with Godel numbers ≤ n.

3. ∀n U ⊢ Conn(U) where Conn(U) denotes that there is no proof of falsity with a proofp where p has the following properties. All non-logical axioms of U that occur in phave Godel numbers ≤ n. All formulas ϕ that occur in p have a logical complexityρ(ϕ) ≤ n.Here ρ is some complexity measure that basically counts the number of quantifieralternations in ϕ. Important features of this ρ are that for every n, there are truthpredicates for formulas with complexity n. Moreover, the ρ-measure of a formulashould be more or less (modulo some poly-time difference, see Remark 3.4) preservedunder translations. An example of such a ρ is given in [19].

It is clear that (2)⇒ (3) can be proven in any weak base theory. For the correspondingprovability notions, the implication reverses. In this paper, our notion of reflexivity shallbe the third one.

We shall write U,nϕ for ¬Conn(U + ¬ϕ) or, equivalently, ∃p ProofU,n(p, ϕ). Here,ProofU,n(p, ϕ) denotes that p is a U -proof of ϕ with all axioms in p are ≤ n and for allformulas ψ that occur in p, we have ρ(ψ) ≤ n.

Remark 2.3. An inspection of the proof of provable Σ1-completeness (Lemma 2.2) givesus some more information. The proof p that witnesses the provability in U of some ∃Σb1-sentence α, can easily be taken so that all axioms occurring in p are about as big and com-plex as α. Thus, from α, we get for some n (depending linearly on α) that ProofU,n(p, α).

If we wish to emphasize the fact that our theories are not necessarily in the language ofarithmetic, but just can be numberized, our formulations of reflexivity should be slightlychanged. For example, (3) will for some 〈U, j〉 look like j : U S1

2+ Conn(U) | n ∈ ω.

7

If U is a reflexive theory, we do not necessarily have any reflection principles. Thatis, we do not have U ⊢ V ϕ → ϕ for some natural V ⊂ U and for some natural class offormulae ϕ. We do have, however, a weak form of ∀Πb1-reflection. This is expressed in thefollowing lemma.

Lemma 2.4. Let U be a reflexive theory. Then

S12 ⊢ ∀∀Πb

1π ∀n U∀x (U,nπ(x)→ π(x)).

Proof. Reason in S12and fix π and n. Let m be such that we have (see Lemma 2.2 and

Remark 2.3)U∀x (¬π(x)→ U,m¬π(x)).

Furthermore, let k := maxn,m. Now, reason in U , fix some x and assume U,nπ(x).Thus, clearly also U,kπ(x). If now ¬π(x), then also U,k¬π(x), whence U,k⊥. Thiscontradicts the reflexivity, whence π(x). As x was arbitrary we get ∀x (U,nπ(x) →π(x)). ⊣

We note that this lemma also holds for the other notions of restricted provability weintroduced in this subsection.

3 Formalized interpretability

As we already mentioned, our notion of interpretability is the one studied by Tarski et alin [17]. In that notion, any axiom needs to be provable after translation. Under some fairlyweak conditions this implies that also theorems are translated to theorems. However, inthe domain of bounded arithmetics we do not generally have this. In the realm of for-malized interpretation therefore, there has been a tendency to consider a small adaptationof the original notion Tarski. This adaptation as introduced by Visser is called smoothinterpretability. In this subsection we shall exactly define this notion and see how it relatesto other notions of formalized interpretability. In various ways, one can hold that theo-rems interpretability as discussed below is actually the more natural formalized version ofinterpretability.

The theories that we study in this paper are theories formulated in first order predicatelogic. All theories have a finite signature that contains identity. For simplicity we shallassume that all our theories are formulated in a purely relational way. Here is the formaldefinition of a relative interpretation.

Definition 3.1. A translation k of the language of a theory S into the language of a theoryT is a pair 〈δ, F 〉 for which the following holds.

The first component δ, is called the domain specifier and is a formula in the languageof T with a single free variable. This formula is used to specify the domain of our inter-pretation.

8

The second component, F , is a finite map that sends relation symbols R (includingidentity) from the language of S, to formulas F (R) in the language of T . We demand forall R that the number of free variables of F (R) equals the arity of R.2 Recursively wedefine the translation ϕk of a formula ϕ in the language of S as follows.

• (R(~x))k = F (R)(~x);

• (ϕ ∧ ψ)k = ϕk ∧ ψk and likewise for other boolean connectives;(in particular, this implies ⊥k = ⊥);

• (∀x ϕ(x))k = ∀x (δ(x)→ ϕk) and analogously for the existential quantifier.

A relative interpretation k of a theory S into a theory T is a translation 〈δ, F 〉 so thatT ⊢ ϕk for all axioms ϕ of S.

To formalize insights about interpretability in weak meta-theories like S12 we need tobe very careful. Definitions of interpretability that are unproblematically equivalent in astrong theory like, say, IΣ1 diverge in weak theories. As we shall see, the major source ofproblems is the absence of BΣ1.

In this subsection, we study various divergent definitions of interpretability. We startby making an elementary observation on interpretations. Basically, the next definition andlemma say that translations transform proofs into translated proofs.

Definition 3.2. Let k be a translation. By recursion on a proof p in natural deductionwe define the translation of p under k, we write pk. For this purpose, we first define k(ϕ)for formulae ϕ to be3

xi∈FV(ϕ)δ(xi) → ϕk. Here FV(ϕ) denotes the set of free variables

of ϕ. Clearly, this set cannot contain more than |ϕ| elements, whence k(ϕ) will not be toobig. Obviously, for sentences ϕ, we have k(ϕ) = ϕk.

If p is just a single assumption ϕ, then pk is k(ϕ). The translation of the proof con-structions are defined precisely in such a way that we can prove Lemma 3.3 below. Forexample, the translation of

ϕ ψ

ϕ ∧ ψ

will be[∧

xi∈FV(ϕ∧ψ)δ(xi)]1

xi∈FV(ϕ)δ(xi)

xi∈FV(ϕ)δ(xi)→ ϕk

ϕkDψk

ϕk ∧ ψk∧

xi∈FV(ϕ∧ψ)δ(xi)→ ϕk ∧ ψk

→ I, 1

2Formally, we should be more precise and specify our variables.3To be really precise we should say that, for example, we let smaller xi come first in

∧xi∈FV(ϕ) δ(xi).

9

. . .

pmp1

t1k . . . tm

k

uk

t1k . . . tm

k

k j

a T -proof an S-proof

(uk)j

AxiomU (u)

u

Figure 1: Transitivity of interpretability

where D is just a symmetric copy of the part above ϕk. We note that the translation of theproof constructions is available4 in S1

2, as the number of free variables in ϕ∧ψ is bounded

by |ϕ ∧ ψ|.

Lemma 3.3. If p is a proof of a sentence ϕ with assumptions in some set of sentences Γ,then for any translation k, pk is a proof of ϕk with assumptions in Γk.

Proof. Note that the restriction on sentences is needed. For example

∀x ϕ(x) ∀x (ϕ(x)→ ψ(x))

ψ(x)

but(∀x ϕ(x))k (∀x (ϕ(x)→ ψ(x)))k

δ(x)→ ψk(x)

and in general 0 (δ(x) → ψk)↔ ψk. The lemma is proved by induction on p. To accountfor formulas in the induction, we use the notion k(ϕ) from Definition 3.2, which is tailoredprecisely to let the induction go through. ⊣

Remark 3.4. The proof translation leaves all the structure invariant. Thus, there is aprovably total (in S1

2) function f such that, if p is a U, n-proof of ϕ, then pk is a proof of

ϕk, where pk has the following properties. All axioms in pk are ≤ f(n, k) and all formulasψ in pk have ρ(ψ) ≤ f(n, k).

4More efficient translations on proofs are also available. However they are less uniform.

10

There are various reasons to give, why we want the notion of interpretability to beprovably transitive, that is, provably SU whenever both ST and TU . The obvious wayof proving this would be by composing (doing the one after the other) two interpretations.Thus, if we have j : S T and k : T U we would like to have j k : S U where j kdenotes a natural composition of translations.

If we try to perform a proof as depicted in Figure 1, at a certain point we would liketo collect the S-proofs p1, · · · , pm of the j-translated T -axioms used in a proof of a k-translation of an axiom u of U , and take the maximum of all such proofs. But to see thatsuch a maximum exists, we precisely need Σ1-collection.

However, it is desirable to also reason about interpretability in the absence of BΣ1. Atrick is needed to circumvent the problem of the unprovability of transitivity (and manyother elementary desiderata).

One way to solve the problem is by switching to a notion of interpretability where theneeded collection has been built in. This is the notion of smooth (axioms) interpretabilityas in Definition 3.5. In this paper we shall mean by interpretability, unless mentionedotherwise, always smooth interpretability. In the presence of BΣ1 this notion will coincidewith the earlier defined notion of interpretability, as Theorem 3.6 tells us.

Definition 3.5. We define the notions of axioms interpretability a, theorems inter-pretability t, smooth axioms interpretability sa and smooth theorems interpretabilityst.

j : U a V := ∀v ∃p (AxiomV (v)→ ProofU (p, vj))

j : U t V := ∀ϕ∀p ∃p′ (ProofV (p, ϕ)→ ProofU (p′, ϕj))

j : U sa V := ∀x∃y ∀ v≤x∃ p≤y (AxiomV (v)→ ProofU (p, vj))

j : U st V := ∀x∃y ∀ϕ≤x∀ p≤x∃ p′≤y (ProofV (p, ϕ)→ ProofU (p′, ϕj))

It is now easy to see that a is indeed provably transitive over very weak base theories.For t this follows almost directly from the definition.

Theorem 3.6. In S12we have all the arrows as depicted in Figure 2.

Proof. We shall only comment on the arrows that are not completely trivial.

• T ⊢ j : U a V → j : U sa V , if T ⊢ BΣ1. So, reason in T and suppose∀v ∃p (AxiomV (v)→ ProofU (p, v

j)). If we fix some x, we get∀ v≤x∃p (AxiomV (v)→ ProofU (p, v

j)). By BΣ1 we get the required∃y ∀ v≤x∃ p≤y (AxiomV (v) → ProofU (v

j)). It is not clear if T ⊢ BΣ−1 , parameter-free

collection, is a necessary condition.

• S126⊢ j : U a V → j : U t V . A counter-example is given in [18].

11

j : U sa V

j : U t V

j : U a V

j : U st V

exp

BΣ1

In S12:

Figure 2: Versions of relative interpretability. The dotted arrows indicate that an additionalcondition is needed in our proof; the condition written next to it. The arrow with a crossthrough it, indicates that we know that the implication fails in S1

2.

• T ⊢ j : UtV → j : UsaV , if T ⊢ exp. If V is reflexive, we get by Corollary 6.9 that⊢ U t V ↔ U sa V . However, different interpretations are used to witness the differentnotions of interpretability in this case. If T ⊢ exp, we reason as follows. We reason in Tand suppose that ∀ϕ∀p ∃p′ (ProofV (p, ϕ)→ ProofU (p

′, ϕj)). We wish to see

∀x∃y ∀ v≤x∃ p≤y (AxiomV (v)→ ProofU (vj)). (1)

So, we pick x arbitrarily and consider5 ν :=∧

AxiomV (vi)∧vi≤x

vi. Notice that in the

worst case, for all y ≤ x, we have AxiomV (y), whence the length of ν can be bounded byx · |x|. Thus, ν itself can be bounded by xx, which exists whenever T ⊢ exp. Clearly,∃p ProofV (p, ν) whence by our assumption ∃p′ ProofU (p

′, νj). In a uniform way, with justa slightly larger proof p′′, every vi

j can be extracted from the proof p′ of νj. We may takethis p′′ ≈ y to obtain (1). Note that T ⊢ exp is not a necessary condition since t impliesa and if we have BΣ1 the latter implies sa.

• S12⊢ j : U sa V → j : U st V . So, we wish to see that

∀x∃y ∀ϕ≤x∀ p≤x∃ p′≤y (ProofV (p, ϕ)→ ProofU (p′, ϕj))

from the assumption that j : U sa V . So, we pick x arbitrarily. If now for some p ≤ x wehave ProofV (p, ϕ), then clearly ϕ ≤ x and all axioms vi of V that occur in p are ≤ x. Byour assumption j : U sa V , we can find a y0 such that we can find proofs pi ≤ y0 for all

5To see that ν exists, we seem to also use some collection; we collect all the vi ≤ x for which AxiomV (vi).However, it is not hard to see that we can consider ν also without collection since we use a natural coding.

12

the vij . Now, with some sloppy notation, let pj[vji /pi] denote the j-translation of p where

each j-translated axiom vji is replaced by pi.

Clearly, pj[vji /pi] is a proof for ϕj . The size of this proof can be estimated (again withsloppy notations):

pj[vji /pi] ≤ pj[vji /y0] ≤ (pj)|y0| ≤ (xj)|y0|.

The latter bound is clearly present in S12. ⊣

We note that we have many admissible rules from one notion of interpretability toanother. For example, by Buss’s theorem on the provably total recursive functions of S1

2,

it is not hard to see that

S12 ⊢ j : U a V ⇒ S12 ⊢ j : U t V.

In the rest of this paper, we shall at most places no longer write subscripts to the ’s.Our reading convention is then that we take that notion of interpretability that is best toperform the argument. Often this is just smooth interpretability s, which from now onis the notation for sa.

Moreover, in [18] some sort of conservation result concerning a and s is proved. Fora considerable class of formulas ϕ and theories T , and for a considerable class of argumentswe have that T ⊢ ϕa ⇒ T ⊢ ϕs. Here ϕa denotes the formula ϕ using the notion a

and likewise for ϕs. Thus indeed, in many cases a sharp distinction between the notionsinvolved is not needed.

We could also consider the following notion of interpretability.

j : U st1 V := ∀x∃y ∀ϕ≤x∃ p′≤y (V ϕ→ ProofU (p′, ϕj))

Clearly, j : U st1 V → U st V . However, for the reverse implication one seems toneed BΠ−

1 . Also, a straightforward proof of U ⊢ id : U st1 U seems to need BΠ−1 .

Thus, the notion st1 seems to say more on the nature of a theory than on the nature ofinterpretability.

4 Cuts and induction

Inductive reasoning is a central feature of everyday mathematical practice. We are so usedto it, that it enters a proof almost unnoticed. It is when one works with weak theories andin the absence of sufficient induction, that its all pervading nature is best felt.

A main tool to compensate for the lack of induction are the so-called definable cuts.They are definable initial segments of the natural numbers of a possibly non-standardmodel that possess some desirable properties that we could not infer for all numbers tohold by means of induction.

The idea is really simple. So, if we can derive ϕ(0) ∧ ∀x (ϕ(x)→ ϕ(x+ 1)) and do nothave access to an induction axiom for ϕ, we just consider J(x) : ∀ y≤x ϕ(y). Clearly J

13

now defines an initial segment on which ϕ holds. As we shall see, for a lot of reasoning wecan restrict ourselves to initial segments rather than quantifying over all numbers.

4.1 Basic properties of cuts

Throughout the literature one can find some variations on the definition of a cut. At someplaces, a cut is only supposed to be an initial segment of the natural numbers. At otherplaces some additional closure properties are demanded. By a well known technique due toSolovay (see for example [6]) any definable initial segment can be shortened in a definableway, so that it has a lot of desirable closure properties. Therefore, and as we almost alwaysneed the closure properties, we include them in our definition.

Definition 4.1. A definable U -cut is a formula J(x) with only x free, for which we havethe following.

1. U ⊢ J(0) ∧ ∀x (J(x)→ J(x+ 1))

2. U ⊢ J(x) ∧ y≤x→ J(y)

3. U ⊢ J(x) ∧ J(y)→ J(x+ y) ∧ J(x · y)

4. U ⊢ J(x)→ J(ω1(x))

We shall sometimes also write x ∈ J instead of J(x). A first fundamental insight aboutcuts is the principle of outside big, inside small. Although not every number x is in J , wecan find for every x a proof px that witnesses x ∈ J .

Lemma 4.2. Let T and U be reasonable arithmetical theories and let J be a U -cut. Wehave that

T ⊢ ∀x UJ(x).

Actually, we can have the quantifier over all cuts within the theory T , that is

T ⊢ ∀U-CutJ ∀x UJ(x).

Proof. Let us start by making the quantifier ∀U-CutJ a bit more precise. By ∀U-CutJ weshall mean ∀J (UCut(J) → . . .). Here Cut(J) is the definable function that sends thecode of a formula χ with one free variable to the code of the formula that expresses thatχ defines a cut.

For a number a, we start with the standard proof of J(0). This proof is combined witha−1 many instantiations of the standard proof of ∀x (J(x) → J(x + 1)). In the case ofweaker theories, we have to switch to efficient numerals to keep the bound of the proofwithin range. ⊣

Remark 4.3. The proof sketch actually tells us that (provably in S12) for every U -cut J ,

there is an n ∈ ω such that ∀x U,nJ(x).

14

Lemma 4.4. Cuts are provably closed under terms, that is

T ⊢ ∀U-CutJ ∀Termt U∀ ~x∈J t(~x) ∈ J.

Proof. By an easy induction on terms, fixing some U -cut J . Prima facie this looks like aΣ1-induction but it is easy to see that the proofs have poly-time (in t) bounds, whence theinduction is ∆0(ω1). ⊣

As all U -cuts are closed under ω1(x) and the smash function ♯, simply relativizing allquantors to a cut is an example of an interpretation of S1

2in U . We shall always denote

both the cut and the interpretation that it defines by the same letter.

4.2 Cuts and the Henkin construction

It is well known that we can perform the Henkin construction in a rather weak meta-theory. As the Henkin model has a uniform description, we can link it to interpretations.The following theorem makes this precise.

Theorem 4.5. If U ⊢ Con(V ), then U V .

Early treatments of this theorem were given in [20] and [7]. A first fully formalizedversion was given in [2]. A proof of Theorem 4.5 would closely follow the Henkin construc-tion.

Thus, first the language of V is extended so that it contains a witness c∃xϕ(x) for everyexistential sentence ∃x ϕ(x). Then we can extend V to a maximal consistent V ′ in theenriched language, containing all sentences of the form ∃xϕ(x) → ϕ(c∃xϕ(x)). This V ′

can be seen as a term model with a corresponding truth predicate. Clearly, if V ⊢ ϕthen ϕ ∈ V ′. It is not hard to see that V ′ is representable (close inspection yields a∆2-representation) in U .

At first sight the argument uses quite some induction in extending V to V ′. Miracu-lously enough, the whole argument can be adapted to S1

2. The trick consists in replacing

the use of induction by employing definable cuts as is explained above. We get the followingtheorem.

Theorem 4.6. For any numberizable theories U and V , we have that

S12 ⊢ UCon(V )→ ∃k (k : U V & ∀ϕ U (V ϕ→ ϕk)).

Proof. A proof can be found in [18]. Actually something stronger is proved there. Namely,that for some standard number m we have

∀ϕ∃ p≤ωm1 (ϕ) ProofU (p,V ϕ→ ϕk).

15

As cuts have nice closure properties, many arguments can be performed within thatcut. The numbers in the cut will, so to say, play the role of the normal numbers. It turnsout that the whole Henkin argument can be carried out using only the consistency on acut.

We shall write JTϕ for ∃ p∈J ProofT (p, ϕ). Thus, it is also clear what

JTϕ and

ConJ(V ) mean.

Theorem 4.7. We have Theorem 4.6 also in the following form.

T ⊢ ∀U-CutI[

UConI(V )→ ∃k (k : U V & ∀ϕ U (V ϕ→ ϕk))

]

Proof. By close inspection of the proof of Theorem 4.6. All operations on hypotheticalproofs p can be bounded by some ωk1(p), for some standard k. As I is closed under ω1(x),all the bounds remain within I. ⊣

We conclude this subsection with two asides, closely related to the Henkin construction.

Lemma 4.8. Let U contain S12. We have that U ⊢ Con(Pred). Here, Con(Pred) is a natural

formalization of the statement that predicate logic is consistent.

Proof. By defining a simple (one-point) model within S12. ⊣

Remark 4.9. If U proves L∆02, then it holds that U V iff V is interpretable in U by

some interpretation that maps identity to identity.

Proof. Suppose j : U V with j = 〈δ, F 〉. We can define j′ := 〈δ′, F ′〉 with δ′(x) :=δ(x)∧∀ y<x (δ(y)→ y 6=jx). F ′ agrees with F on all symbols except that it maps identityto identity. By the minimal number principle we can prove ∀x (δ(x)→ ∃x′ (x′=jx)∧δ′(x)),and thus ∀~x (δ′(~x)→ (ϕj(~x)↔ ϕj

′(~x))) for all formulae ϕ. ⊣

5 Pudlak’s lemma

In this section we will state and prove what is known as Pudlak’s lemma. Moreover, weshall prove a very useful consequence of this lemma. Roughly speaking, Pudlak’s lemmatells us how interpretations bear on the models that they induce. Therefor, let us first seehow interpretations and models are related.

5.1 Interpretations and models

We can view interpretations j : U V as a way of defining uniformly a model N of Vinside a modelM of U . Interpretations in foundational papers mostly bear the guise of auniform model construction.

16

Definition 5.1. Let j : U V with j = 〈δ, F 〉. IfM |= U , we denote byMj the followingmodel.

• |Mj | = x ∈ |M| | M |= δ(x)/ ≡, where a ≡ b iffM |= a =j b.

• Mj |= R(α1, . . . , αn) iffM |= F (R)(a1, . . . , an), for some a1 ∈ α1, . . . , an ∈ αn.

The fact that j : U V is now reflected in the observation that, whenever M |= U , thenMj |= V .

On many occasions viewing interpretations as uniform model constructions providesthe right heuristics.

5.2 Pudlak’s isomorphic cut

Pudlak’s lemma is central to many arguments in the field of interpretability logics. Itprovides a means to compare a modelM of U and its internally defined modelMj of Vif j : U V . If U has full induction, this comparison is fairly easy.

Theorem 5.2. Suppose j : U V and U has full induction. LetM be a model of U . Wehave that MendM

j via a definable embedding.

Proof. If U has full induction and j : U V , we may by Remark 4.9 actually assume thatj maps identity in V to identity in U . Thus, we can define the following function.

f :=

0 7→ 0j

x+ 1 7→ f(x)+j1j

Now, by induction, f can be proved to be total. Note that full induction is needed here,as we have a-priori no bound on the complexity of 0j and +j . Moreover, it can be provedthat f(a + b) = f(a)+jf(b), f(a · b) = f(a) ·jf(b) and that y≤jf(b) → ∃ a<b f(a) = y.In other words, that f is an isomorphism between its domain and its co-domain and theco-domain is an initial segment ofMj. ⊣

If U does not have full induction, a comparison betweenM andMj is given by Pudlak’slemma, first explicitly mentioned in [15]. Roughly, Pudlak’s lemma says that in the generalcase, we can find a definable U -cut I ofM and a definable embedding f : I −→Mj suchthat f [I] endM

j .In formulating the statement we have to be careful as we can no longer assume that

identity is mapped to identity. A precise formulation of Pudlak’s lemma in terms of anisomorphism between two initial segments can for example be found in [10]. We havechosen here to formulate and prove the most general syntactic consequence of Pudlak’slemma, namely that I and f [I], as substructures ofM andMj respectively, make true thesame ∆0-formulas.

In the proof of Pudlak’s lemma we shall make the quantifier ∃j,J-functionh explicit. Itbasically means that h defines a function from a cut J to the =j-equivalence classes of thenumbers defined by the interpretation j.

17

Lemma 5.3 (Pudlak’s Lemma).

S12 ⊢ j : U V → ∃U-CutJ ∃j,J-functionh∀∆0ϕ U∀ ~x ∈ J (ϕj(h(~x))↔ ϕ(~x))

Moreover, the h and J can be obtained uniformly from j by a function that is provably totalin S1

2.

Proof. Again, by ∃U-CutJ ψ we shall mean ∃J (UCut(J)∧ψ), where Cut(J) is the definablefunction that sends the code of a formula χ to the code of a formula that expresses thatχ defines a cut. We apply a similar strategy for quantifying over j, J-functions. Given atranslation j, the defining property for a relation H to be a j, J-function is

∀ ~x, y, y′∈J (H(~x, y) & H(~x, y′)→ y=jy′).

We will often consider H as a function h and write for example ψ(h(~x)) instead of

∀y (H(~x, y)→ ψ(y)).

The idea of the proof is very easy. Just map the numbers of U via h to the numbers ofV so that 0 goes to 0j and the mapping commutes with the successor relation. If we wantto prove a property of this mapping, we might run into problems as the intuitive proofappeals to induction. And sufficient induction is precisely what we lack in weaker theories.

The way out here is to just put all the properties that we need our function h topossess into its definition. Of course, then the work is in checking that we still have a gooddefinition. Being good means here that the set of numbers on which h is defined inducesa definable U -cut.

In a sense, we want an (definable) initial part of the numbers of U to be isomorphicunder h to an initial part of the numbers of V . Thus, h should definitely commute withsuccessor, addition and multiplication. Moreover, the image of h should define an initialsegment, that is, be closed under the smaller than relation. All these requirements are re-flected in the definition of Goodsequence. Let δ denote the domain specifier of the translatioj. We define

Goodsequence(σ, x, y) := lh(σ) = x+ 1 ∧ σ0=j0j ∧ σx=

jy∧ ∀ i≤x δ(σi)∧ ∀ i<x (σi+1=

jσi+j1j)

∧ ∀ k+l≤x (σk+jσl=

jσk+l)∧ ∀ k·l≤x (σk·

jσl=jσk·l)

∧ ∀a (a≤jy → ∃ i≤x σi=ja).

Subsequently, we define

H(x, y) := ∃σ Goodsequence(σ, x, y)∧ ∀σ′ ∀y′ (Goodsequence(σ′, x, y′)→ y=jy′),

18

andJ ′(x) := ∀x′≤x∃y H(x′, y).

Finally, we define J to be the closure of J ′ under +, · and ω1(x). Now that we have definedall the machinery we can start the real proof. The reader is encouraged to see at whatplace which defining property is used in the proof.

We first note that J ′(x) indeed defines a U -cut. For UJ′(0) you basically need se-

quentiality of U , and the translations of the identity axioms and properties of 0.To see U∀x (J ′(x) → J ′(x + 1)) is also not hard. It follows from the translation of

basic properties provable in V , like x = y → x+ 1 = y + 1 and x+ (y + 1) = (x+ y) + 1,etc. The other properties of Definition 4.1 go similarly.

We should now see that h is a j, J-function. This is quite easy, as we have all thenecessary conditions present in our definition. Actually, we have

U∀x, y∈J (h(x)=jh(y)↔ x = y) (2)

The← direction reflects that h is a j, J-function. The→ direction follows from elementaryreasoning in U using the translation of basic arithmetical facts provable in V . So, if x 6= y,say x < y, then x+ (z + 1) = y whence h(x)+jh(z + 1)=jh(y) which implies h(x)6=jh(y).

We are now to see that for our U -cut J and for our j, J-function h we indeed have that6

∀∆0ϕ U∀ ~x∈J (ϕj(h(~x))↔ ϕ(~x)).

First we shall proof this using a seemingly Σ1-induction. A closer inspection of the proofshall show that we can provide at all places sufficiently small bounds, so that actually anω1(x)-induction suffices. We first proof the following claim.

Claim 1. ∀Termt U∀ ~x, y ∈ J (tj(h(~x))=jh(y)↔ t(~x) = y)

Proof. The proof is by induction on t. The basis is trivial. To see for example

U∀ y∈J (0j=jh(y)↔ 0 = y)

we reason in U as follows. By the definition of h, we have that h(0)=j0j , and by (2) wemoreover see that 0j=jh(y) ↔ 0 = y. The other base case, that is, when t is an atom, isprecisely (2).

For the induction step, we shall only do +, as · goes almost completely the same. Thus,we assume that t(~x) = t1(~x) + t2(~x) and set out to prove

U∀ ~x, y∈J (t1j(h(~x))+jt2

j(h(~x))=jh(y)↔ t1(~x) + t2(~x) = y).

Within U :

6We use h(~x) as short for h(x0), · · · , h(xn).

19

← If t1(~x) + t2(~x) = y, then by Lemma 4.4, we can find y1 and y2 with t1(~x) =y1 and t2(~x) = y2. The induction hypothesis tells us that t1

j(h(~x))=jh(y1) andt2j(h(~x))=jh(y2). Now by (2), h(y1 + y2)=

jh(y) and by the definition of h we getthat

h(y1 + y2) =j h(y1)+jh(y2)

=ji.h. t1

j(h(~x))+jt2j(h(~x))

=j (t1(h(~x)) + t2(h(~x)))j.

→ Suppose now t1j(h(~x))+jt2

j(h(~x))=jh(y). Then clearly t1j(h(~x))≤jh(y) whence by

the definition of h we can find some y1 ≤ y such that t1j(h(~x))=j h(y1) and likewise for

t2 (using the translation of the commutativity of addition). The induction hypothesisnow yields t1(~x) = y1 and t2(~x) = y2. By the definition of h, we geth(y)=jh(y1)+

jh(y2)=jh(y1+y2), whence by (2), y1+y2 = y, that is, t1(~x)+t2(~x) = y.

We now prove by induction on ϕ ∈ ∆0 that

U∀ ~x∈J (ϕj(h(~x))↔ ϕ(~x)). (3)

For the base case, we consider that ϕ ≡ t1(~x) + t2(~x). We can now use Lemma 4.4 tonote that

U∀ ~x∈J (t1(~x) = t2(~x)↔ ∃ y∈J (t1(~x) = y ∧ t2(~x) = y))

and then use Claim 1, the transitivity of = and its translation to obtain the result.The boolean connectives are really trivial, so we only need to consider bounded quan-

tification. We show (still within U) that

∀ y, ~z∈J (∀x≤jh(y) ϕj(x, h(~z))↔ ∀x≤y ϕ(x, ~z)).

← Assume ∀x≤y ϕ(x, ~z) for some y, ~z ∈ J . We are to show∀x≤jh(y) ϕj(x, h(~z)). Now, pick some x≤jh(y) (the translation of the universal quantifieractually gives us an additional δ(x) which we shall omit for the sake of readability). Now bythe definition of h we find some y′ ≤ y such that h(y′) = x. As y′ ≤ y, by our assumption,ϕ(y′, ~z) whence by the induction hypothesis ϕj(h(y′), h(~z)), that is ϕj(x, h(~z)). As x wasarbitrarily ≤jh(y), we are done.→ Suppose ∀x≤jh(y) ϕj(x, h(~z)). We are to see that ∀x≤y ϕ(x, ~z)). So, pick x ≤ y

arbitrarily. Clearly h(x)≤jh(y), whence, by our assumptionϕj(h(x), h(~z)) and by the induction hypothesis, ϕ(x, ~z).

Note that in our proof we have used twice a Σ1-induction; In Claim 1 and in proving(3). Let us now see that we can dispense with the Σ1 induction.

In both cases, at every induction step, a constant piece p′ of proof is added to thetotal proof. This piece looks every time the same. Only some parameters in it have to

20

be replaced by subterms of t. So, the addition to the total proof can be estimated byp′a(t) which is about O(tk) for some standard k and indeed, our induction was really but abounded one. Both our inductions went over syntax and whence are available in S1

2.

Note that in proving (3) we dealt with the bounded quantification by appealing to theinduction hypothesis only once, followed by a generalization. So, fortunately we did notneed to apply the induction hypothesis to all x≤y, which would have yielded an exponentialblow-up. ⊣

Remark 5.4. Pudlak’s lemma is valid already if we employ the notion of theorems inter-pretability rather than smooth interpretability. If we work with theories in the languageof arithmetic, we can do even better. In this case, axioms interpretability can suffice. Inorder to get this, all arithmetical facts whose translations were used in the proof of Lemma5.3 have to be promoted to the status of axiom. However, a close inspection of the proofshows that these facts are very basic and that there are not so many of them.

If j is an interpretation with j : α β, we shall sometimes call the correspondingisomorphic cut that is given by Lemma 5.3, the Pudlak cut of j and denote it by thecorresponding upper case letter J .

5.3 A consequence of Pudlak’s Lemma

The following consequence of Pudlak’s Lemma is simple, yet can be very useful. For sim-plicity we state the consequence for sentential extensions of some base theory T extendingS12. Thus, α β will be short for (T + α) (T + β).

Lemma 5.5. (In S12:) If j : α β then, for every T + β cut I there exists a T + α cut J

such that for every γ we have that

j : α ∧ Jγ β ∧

Iγ.

Proof. By a minor adaptation of the standard argument. First, we define Goodsequence.

Goodsequence(σ, x, y) := lh(σ) = x+ 1 ∧ σ0=j0j ∧ σx=

jy∧ ∀ i<x (σi+1=

jσi+j1j)

∧ ∀ k+l≤x (σk+jσl=

jσk+l)∧ ∀ k·l≤x (σk·

jσl=jσk·l)

∧ ∀a (a≤jy → ∃ i≤x σi=ja)

∧ ∀ i<x Ij(σi)

Next, we define

H(x, y) := ∃σ Goodsequence(σ, x, y)∧ ∀σ′ ∀y′ (Goodsequence(σ′, x, y′)→ y=jy′),

21

andJ ′(x) := ∀x′≤x∃y H(x′, y).

Finally, we define J to be the closure of J ′ under +, · and ω1(x).As before, one can see H(x, y) as defining a function (modulo =j), call it h, that defines

an isomorphism between J and the image of J . Moreover, in the definition of Goodsequencewe demanded that the image of h is a subset of I in the clause ∀ i<x Ij(σi).

It is easy to see that J ′ is closed under successor, that is, J ′(x) → J ′(x + 1). Weonly comment on the new ingredient of the image of h being a subset of I. However,T + β ⊢ I(x) → I(x + 1), as I is a definable cut. As j : T + α T + β, clearly T + α ⊢Ij(x)→ Ij(x+j 1j) and indeed J ′ is closed under successor.

In the literature, Lemma 5.5 was known only for I to be the trivial cut of all numbersdefined by x = x.

6 The Orey-Hajek characterizations

This final section contains the most substantial part of the paper. We consider the diagramfrom Figure 3. It is well known that all the implications hold when both U and V arereflexive. This fact is referred to as the Orey-Hajek characterizations ([2], [14], [4], [5]) forinterpretability. However, for the Π1-conservativity part, we should also mention work byGuaspari, Lindstrom and Pudlak ([3], [12], [13], [15]).

In this section we shall comment on all the implications in Figure 3, and study theconditions on U , V and the meta-theory, that are necessary or sufficient.

Lemma 6.1. In S12we can prove ∀n UConn(V )→ U V .

Proof. The only requirement for this implication to hold, is that U ⊢ Con(Pred). But, byour assumptions on U and by Lemma 4.8 this is automatically satisfied.

Let us first give the informal proof. Thus, let AxiomV (x) be the formula that definesthe axiom set of V .

We now apply a trick due to Feferman and consider the theory V ′ that consists of thoseaxioms of V up to which we have evidence for their consistency. Thus, AxiomV ′(x) :=AxiomV (x) ∧ Conx(V ).

We shall now prove that U V in two steps. First, we will see that

U ⊢ Con(V ′). (4)

Thus, by Theorem 4.5 we get that U V ′. Second, we shall see that

V = V ′. (5)

22

U V

∀∀Πb1π (V π → Uπ)

∀n U ⊢ Conn(V )6.1

6.2

6.36.5 6.46.6

Figure 3: Characterizations of interpretability. The labels at the arrows are references towhere in the paper this arrow is proven and what the conditions are for the arrow to hold.Moreover, we will discuss which conditions should hold for the base theory so that theimplications become formalizable.

23

To see (4), we reason in U , and assume for a contradiction that ProofV ′(p,⊥) for someproof p. We consider the largest axiom v that occurs in p. By assumption we have (inU) that AxiomV ′(v) whence Conv(V ). But, as clearly V ′ ⊆ V , we see that p is also aV -proof. We can now obtain a cut-free proof p′ of ⊥. Clearly ProofV,v(p

′,⊥) and we haveour contradiction.

If V ′ is empty, we cannot consider v. But in this case, Con(V ′) ↔ Con(Pred), and byassumption, U ⊢ Con(Pred).

We shall now see (5). Clearly N |= AxiomV ′(v) → AxiomV (v) for any v ∈ N. To seethat the converse also holds, we reason as follows.

Suppose N |= AxiomV (v). By assumption U ⊢ Conv(V ), whence Conv(V ) holds on anymodelM of U . We now observe that N is an initial segment of (the numbers of) any modelM of U , that is,

N endM. (6)

As M |= Conv(V ) and as Conv(V ) is a Π1-sentence, we see that also N |= Conv(V ). Byassumption we had N |= AxiomV (v), thus we get that N |= AxiomV ′(v). We conclude that

N |= AxiomV (x)↔ AxiomV ′(x) (7)

whence, that V = V ′. As U ⊢ Con(V ′), we get by Theorem 4.5 that U V ′. We may thusinfer the required U V .

It is not possible to directly formalize the informal proof. At (7) we concluded thatV = V ′. This actually uses some form of Π1-reflection which is manifested in (6). The lackof reflection in the formal environment will be compensated by another sort of reflection,as formulated in Theorem 4.6.

Moreover, to see (4), we had to use a cut elimination. To avoid this, we shall need asharper version of Feferman’s trick.

Let us now start with the formal proof sketch and refer to [18] for more details. Weshall reason in U . Without any induction we conclude ∀x (Conx(V ) → Conx+1(V )) or∃x (Conx(V ) ∧V,x+1⊥). In both cases we shall sketch a Henkin construction.

If ∀x (Conx(V )→ Conx+1(V )) and also Con0(V ), we can find a cut J(x) with J(x)→Conx(V ). We now consider the following non-standard proof predicate.

∗Wϕ := ∃x∈J W,xϕ

We note that we have Con∗(V ), where Con∗(V ) of course denotes ¬(∃x∈J V,x⊥). Asalways, we extend the language on J by adding witnesses and define a series of theories inthe usual way. That is, by adding more and more sentences (in J) to our theories whilestaying consistent (in our non-standard sense).

V = V0 ⊆ V1 ⊆ V2 ⊆ · · ·with Con∗(Vi) (8)

24

We note that ∗Viϕ and

∗Vi¬ϕ is not possible, and that for ϕ ∈ J we can not have

Con∗(ϕ ∧ ¬ϕ). These observations seem to be too trivial to make, but actually many anon-standard proof predicate encountered in the literature does prove the consistency ofinconsistent theories.

As always, the sequence (8) defines a cut I ⊆ J , that induces a Henkin set W and wecan relate our required interpretation k to this Henkin set as was, for example, done in[18].

We now consider the case that for some fixed b we have Conb(V ) ∧ V,b+1⊥. We notethat we can see the uniqueness of this b without using any substantial induction. Basically,we shall now do the same construction as before only that we now possibly stop at b.

For example the cut J(x) will now be replaced by x ≤ b. Thus, we may end up with atruncated Henkin setW . But this set is complete with respect to relatively small formulas.Moreover, W is certainly closed under subformulas and substitution of witnesses. Thus,W is sufficiently large to define the required interpretation k.

In both cases we can perform the following reasoning.

V ϕ → ∃x V,xϕ→ ∃x U (Conx(V ) ∧ V,xϕ)→ U

∗V ϕ

→ Uϕk by Theorem 4.6.

The remarks from [18] on the bounds of our proofs are still applicable and we thus obtaina smooth interpretation. ⊣

Lemma 6.2. In the presence of exp, we can prove that for reflexive U , UV → ∀x UConx(V ).

Proof. The informal argument is conceptually very clear and we have depicted it in Figure4. The accompanying reasoning is as follows.

We assume U V , whence for some k we have k : U V . Thus, for axioms in-terpretability we find that ∀u∃p (AxiomV (u) → ProofU (p, u

k)). We are now to see that∀x U ⊢ Conx(V ). So, we fix some x. By our assumption we get that for some l, that

∀u≤x∃p (AxiomV (u)→ ProofU,l(p, uk)). (9)

This formula is actually equivalent to the Σ1-formula

∃n ∀u≤x∃ p≤n (AxiomV (u)→ ProofU,l(p, uk)) (10)

from which we may conclude by provable Σ1-completeness,

U ⊢ ∃n ∀u≤x∃ p≤n (AxiomV (u)→ ProofU,l(p, uk)). (11)

25

v1k . . . vm

k

v1k . . . vm

k

u1 . . . ulm

In U :

. . .

v1 . . . vm

k

u1 . . . ul1

pmp1

Figure 4: Transformations on proofs

We now reason in U and suppose that there is some V, x-proof p of ⊥. The assumptionsin p are axioms v1 . . . vm of V , with each vi ≤ x. Moreover, all the formulas ψ in p haveρ(ψ) ≤ x. By Lemma 3.3, this p transforms to a proof pk of ⊥k which is again ⊥.

The assumptions in pk are now among the v1k . . . vm

k. By Remark 3.4 we get that forsome n′ depending on x and k, we have that all the axioms in pk are ≤ n′ and all the ψoccurring in pk have ρ(ψ) ≤ n′.

Now by (11), we have U, l-proofs pi ≤ n of vik. The assumptions in the pi are axioms

of U . Clearly all of these axioms are ≤ l. We can now form a U, l+n′-proof p′ of ⊥ bysubstituting all the pi for the (vi)

k. Thus we have shown ProofU,l+n′(p′,⊥). But this clearlycontradicts the reflexivity of U .

The informal argument is readily formalized to obtain T ⊢ U V → ∀x UCon(V, x).However there are some subtleties.

First of all, to conclude that (9) is equivalent to (10), a genuine application of BΣ1

is needed. If U lacks BΣ1, we have to switch to smooth interpretability to still have theimplication valid. Smoothness then automatically also provides the l that we used in 9.

In addition we need that T proves the totality of exponentiation. For weaker theories,we only have provable ∃Σb1-completeness. But if AxiomV (u) is ∆

b1, we can only guarantee

that ∀u≤m ∃ p≤n (AxiomV (u)→ ProofU (p, uk)) is Πb2. As far as we know, exponentiation

is needed to prove ∃Πb2-completeness.All other transformations of objects in our proof only require the totality of ω1(x). ⊣

The assumption that U is reflexive can in a sense not be dispensed with. That is, if

∀V (U V → ∀x UConx(V )), (12)

26

then U is reflexive, as clearly U U . In a similar way we see that if

∀U (U V → ∀x UConx(V )), (13)

then V is reflexive. However, V being reflexive could never be a sufficient condition for (13)to hold, as we know from [16] that interpreting reflexive theories in finitely many axiomsis complete Σ3.

Lemma 6.3. In S12we can prove ∀x UConx(V )→ ∀∀Π

b1π (V π → Uπ).

Proof. There are no conditions on U and V for this implication to hold. We shall directlygive the formal proof as the informal proof does not give a clearer picture.

Thus, we reason in S12and assume ∀x UConx(V ). Now we consider any π ∈ ∀Πb1 such

that V π. Thus, for some x we have V,xπ. We choose x large enough, so that we alsohave (see Remark 2.3)

U (¬π → V,x¬π). (14)

As V,xπ → UV,xπ, we also have that

UV,xπ. (15)

Combining (14), (15) and the assumption that ∀x UConx(V ), we see that indeed Uπ. ⊣

Lemma 6.4. In S12we can prove that for reflexive V we have

∀∀Πb1π (V π → Uπ)→ ∀x UConx(V ).

Proof. If V is reflexive and ∀∀Πb1π (V π → Uπ) then, as for every x, Conx(V ) is a ∀Πb1-

formula, also ∀x UConx(V ). ⊣

It is obvious that

∀U [∀∀Πb1π (V π → Uπ)→ ∀x UConx(V )] (16)

implies that V is reflexive. Likewise,

∀V [∀∀Πb1π (V π → Uπ)→ ∀x UConx(V )] (17)

implies that U is reflexive. However, U being reflexive can never be a sufficient conditionfor (17) to hold. An easy counterexample is obtained by taking U to be PRA and V tobe IΣ1 as it is well-known that IΣ1 is provably Π2 conservative over PRA and that IΣ1 isfinitely axiomatized.

Lemma 6.5. (In S12:) For reflexive V we have ∀∀Π

b1π (V π → Uπ)→ U V .

27

Proof. We know of no direct proof of this implication. Also, all proofs in the literature govia Lemmata 6.4 and 6.1, and hence use reflexivity of V . ⊣

In our context, the reflexivity of V is not necessary, as ∀U US12and S1

2is not reflexive.

Lemma 6.6. Let U be a reflexive and sequential theory. We have in S12that U V →

∀∀Πb1π (V π → Uπ).If moreover U ⊢ exp we also get U V → ∀Π1π (V π → Uπ). If U is not reflexive,

we still have that U V → ∃U-CutJ ∀Π1π (V π → UπJ).

For these implications, it is actually sufficient to work with the notion of theoremsinterpretability.

Proof. The intuition for the formal proof comes from Pudlak’s lemma, which in turn istailored to compensate a lack of induction. We shall first give an informal proof sketch ifU has full induction. Then we shall give the formal proof using Pudlak’s lemma.

If U has full induction and j : U V , we may assume by Remark 4.9 assume that jmaps identity to identity. LetM be an arbitrary model of U . By Theorem 5.2 we now seethatM end M

j . If for some π ∈ Π1, V π then by soundnessMj |= π, whenceM |= π.AsM was an arbitrary model of U , we get by the completeness theorem that Uπ.

To transform this argument into a formal one, valid for weak theories, there are twomajor adaptations to be made. First, the use of the soundness and completeness theoremhave to be avoided . This can be done by simply staying in the realm of provability.Secondly, we should get rid of the use of full induction. This is done by switching to a cutin Pudlak’s lemma.

Thus, the formal argument runs as follows. Reason in S12and assume U V .

We fix some j : UV . By Pudlak’s lemma, Lemma 5.3, we now find7 a definable U -cutJ and a j, J-function h such that

∀∆0ϕ U∀ ~x∈J (ϕj(h(~x))↔ ϕ(~x)).

We shall see that for this cut J we have that

∀Π1π (V π → UπJ). (18)

Therefore, we fix some π ∈ Π1 and assume V π. Let ϕ(x) ∈ ∆0 be such that π = ∀x ϕ(x).Thus we have V ∀x ϕ(x), hence by theorems interpretability

U∀x (δ(x)→ ϕj(x)). (19)

We are to seeU∀x (J(x)→ ϕ(x)). (20)

7Remark 5.4 ensures us that we can find them also in the case of theorems interpretability.

28

To see this, we reason in U and fix x such that J(x). By definition of J , h(x) is defined.By the definition of h, we have δ(h(x)), whence by (19), ϕj(h(x)). Pudlak’s lemma nowyields the desired ϕ(x). As x was arbitrary, we have proved (20).

So far, we have not used the reflexivity of U . We shall now see that

∀∀Πb1π (Uπ

J → Uπ)

holds for any U -cut J whenever U is reflexive. For this purpose, we fix some π ∈ ∀Πb1,some U -cut J and assume Uπ

J . Thus, ∃n U,nπJ and also ∃n UU,nπ

J . If π = ∀x ϕ(x)with ϕ(x) ∈ Πb1, we get ∃n UU,n∀x (x ∈ J → ϕ(x)), whence also

∃n U∀x U,n(x ∈ J → ϕ(x)).

By Lemma 4.2 and Remark 4.3, for large enough n, this implies

∃n U∀x U,nϕ(x)

and by Lemma 2.4 (only here we use that π ∈ ∀Πb1) we obtain the required U∀x ϕ(x). ⊣

Again, by [16] we note that V being reflexive can never be a sufficient condition for

∀U [U V → ∀∀Πb1π (V π → Uπ)].

The main work on the Orey-Hajek characterization has now been done. We can easilyextract some useful, mostly well-known corollaries.

Corollary 6.7. If U is a reflexive theory, then

T ⊢ U V ↔ ∀x UConx(V ).

Here T contains exp and denotes smooth interpretability.

Corollary 6.8. (In S12:) If V is a reflexive theory, then the following are equivalent.

1. U V

2. ∃U-CutJ ∀Π1π (V π → UπJ)

3. ∃U-CutJ ∀x UConJx(V )

Proof. This is part of Theorem 2.3 from [16]. (1) ⇒ (2) is already proved in Lemma 6.6,(2) ⇒ (3) follows from the transitivity of V and (3) ⇒ (1) is a sharpening of Lemma 6.1.which closely follows Theorem 4.7. Note that may denote denote smooth or theoremsinterpretability. ⊣

Corollary 6.9. If V is reflexive, then

S12 ⊢ U t V ↔ U s V.

29

Proof. By Remark 5.4 and Corollary 6.8. ⊣

Corollary 6.10. If U and V are both reflexive theories we have that the following areprovably equivalent in S1

2.

1. U V

2. ∀∀Πb1π (V π → Uπ)

3. ∀x UConx(V )

Proof. If we go (1)⇒ (2)⇒ (3)⇒ (1) we do not need the totality of exp that was neededfor (1)⇒ (3). ⊣

As an application we can, for example, see that PAPA+ InCon(PA). It is well knownthat PA is essentially reflexive which means that any finite extension of it is reflexive. So,we use Corollary 6.10 and, it is sufficient to show that PA + InCon(PA) is Π1-conservativeover PA.

So, suppose that PA + InCon(PA) ⊢ π for some Π1-sentence π. In other words PA ⊢⊥ → π. We shall now see that PA ⊢ π → π, which by Lob’s Theorem gives us PA ⊢ π.

Thus, in PA, assume π. Suppose for a contradiction that ¬π. By Σ1-completenesswe also get ¬π, which yields ⊥ with the assumption π. But we have ⊥ → π and weconclude π. A contradiction, so that indeed PA PA+ InCon(PA).

Acknowledgements

I am grateful to Lev Beklemishev, Felix Lara and Albert Visser for pointers to the literatureand helpful discussions.

This research has been funded by Grant 2014 SGR 437 from the Catalan governmentand by Grant MTM2014-59178-P from the Spanish government.

References

[1] S.R. Buss. First-order proof theory of arithmetic. In S.R. Buss, editor, Handbook ofProof Theory, pages 79–148, Amsterdam, 1998. Elsevier, North-Holland.

[2] S. Feferman. Arithmetization of metamathematics in a general setting. FundamentaMathematicae, 49:35–92, 1960.

[3] D. Guaspari. Partially conservative sentences and interpretability. Transactions ofAMS, 254:47–68, 1979.

[4] P. Hajek. On interpretability in set theories I. Comm. Math. Univ. Carolinae, 12:73–79, 1971.

30

[5] P. Hajek. On interpretability in set theories II. Comm. Math. Univ. Carolinae, 13:445–455, 1972.

[6] P. Hajek and P. Pudlak. Metamathematics of First Order Arithmetic. Springer-Verlag,Berlin, Heidelberg, New York, 1993.

[7] D. Hilbert and P. Bernays. Grundlagen der Mathematik, Vols. I and II, 2d ed.Springer-Verlag, Berlin, 1968.

[8] J. J. Joosten. Interpretability Formalized. PhD thesis, Utrecht University, 2004.

[9] J. J. Joosten. Two series of formalized interpretability principles for weak systems ofarithmetic. arXiv:1503.09130 [math.LO], 2015.

[10] J.J. Joosten and A. Visser. The interpretability logic of all reasonable arithmeticaltheories. Erkenntnis, 53(1–2):3–26, 2000.

[11] Jan Krajıek. Bounded Arithmetic, Propositional Logic, and Complexity Theory. Cam-bridge University Press, 1995.

[12] P. Lindstrom. Some results on interpretability. In Proceedings of the 5th ScandinavianLogic Symposium, pages 329–361. Aalborg University press, 1979.

[13] P. Lindstrom. On partially conservative sentences and interpretability. Proceedings ofthe AMS, 91(3):436–443, 1984.

[14] S. Orey. Relative interpretations. Zeitschrift f. math. Logik und Grundlagen d. Math.,7:146–153, 1961.

[15] P. Pudlak. Cuts, consistency statements and interpretations. Journal of SymbolicLogic, 50:423–441, 1985.

[16] V.Yu. Shavrukov. Interpreting reflexive theories in finitely many axioms. FundamentaMathematicae, 152:99–116, 1997.

[17] A. Tarski, A. Mostowski, and R. Robinson. Undecidable theories. North–Holland,Amsterdam, 1953.

[18] A. Visser. The formalization of interpretability. Studia Logica, 50(1):81–106, 1991.

[19] A. Visser. The unprovability of small inconsistency. Archive for Mathematical Logic,32:275–298, 1993.

[20] H Wang. Arithmetical models of formal systems. Methodos 3, pages 217–232, 1951.

31