Lecture 4: Sorites 2

51
Lecture 4: Sorites 2 Finish Kamp 1981 Explain other context-based approaches (Kyburg & Morreau: fitting words)

description

Lecture 4: Sorites 2. Finish Kamp 1981 Explain other context-based approaches (Kyburg & Morreau: fitting words). Context for sorites: Kamp 1981. Context C is a deductively closed set of formulas, e.g., a. {small(1), small(2), ¬small(3)} b. {small(1), small(2), ¬small(4)} - PowerPoint PPT Presentation

Transcript of Lecture 4: Sorites 2

Page 1: Lecture 4: Sorites 2

Lecture 4: Sorites 2

Finish Kamp 1981Explain other context-based approaches

(Kyburg & Morreau: fitting words)

Page 2: Lecture 4: Sorites 2

Context for sorites: Kamp 1981

• Context C is a deductively closed set of formulas, e.g.,

a. {small(1), small(2), ¬small(3)}b. {small(1), small(2), ¬small(4)}c. {small(1), small(6), ¬small(4)}

• C is incoherent xy(small(x) εC & ¬small(y) εC & (x~y V y<<x))

• Suppose JND=2. Thena is incoherent (2~3)b is coherent (not 2~4)c is incoherent (4 << 6)

Page 3: Lecture 4: Sorites 2

3-valued truth definition: (Here only truth)

• [small(a)]C=1 iff b(small(b) εC & (b~a or a<<b))• [pq]C=1 iff [p]C=0 or [q]C{p}=1 • [p&q]C=1 iff [p]C=1 and [q]C=1 and

C {p,q} is coherent

Let C contain small(0) and ¬small(1000). Then

• For every i, we have [small(i) small(i+1)]C=1 because [small(i+1)]C{small(i)}=1• But let p1= small(0) small(1), p2= small(1) small(2), etc.

Then [p1&…&p1000]C≠1, since C {p1,…&p1000} is not coherent

Page 4: Lecture 4: Sorites 2

Kamp 1981

universal quantification: like conjunction

[x(x)]C=1 iff

[(a)]C=1 for all constants a &

{(a)} is coherent

Example: [i(small(i)small(i+1))]=1 iff

[small(1)small(2)]C=1 & …. &

the set of all the instantiations is coherent

Page 5: Lecture 4: Sorites 2

Analysis of paradox (Kamp 81)

Every instance of the crucial premiss

Is true, but– Not their conjunction– Not the quantified version

Is the paradox valid?

Page 6: Lecture 4: Sorites 2

The argument (n=1000)

small(0)

small(0)small(1), so small(1)

small(1)small(2), so small(2)

small(999)small(1000), so small(1000)

Page 7: Lecture 4: Sorites 2

Contextualised version of validity: If premisses are true then conclusion is true in the context that collects all premisses.

Argument is valid. Key: If [small(i)]C=1 and

[small(i)small(i+1)]C=1 then

[small(i+1)]C{small(i)}=1

Page 8: Lecture 4: Sorites 2

Kamp 1981: consequences

• The argument is deemed valid

• Each instantiation of the crucial premiss is true, but their conjunction is not

• Likewise, quantified version is false• Drawback: in many ways, this is as

nonclassical as earlier accounts.(Compare Kamp’s critique of these)

Page 9: Lecture 4: Sorites 2

Veltman 1987

• Variant of Kamp’s approach, based on an idea of Nelson Goodman (further developed by Michael Dummett)

• Key idea: whether two objects can be distinguished depends on other objects in the context

a ~ b ~ c | | a << c

If c is absent then a and b are indistinguishable.But with c present, a difference between a and b can be inferred: c is called a help element.

Page 10: Lecture 4: Sorites 2

Notations

(x << y)A

x << y V hεA(h<<y & ¬ h<<x) VhεA(h>>x & ¬ h>>y)

(x ~ y)A ¬(x << y)A & ¬(y << x)A

Page 11: Lecture 4: Sorites 2

Veltman 1987 (sketch)

• Proposed to judge the truth of a discourserelative to a context determined by the discourse

• A context is coherent iff it satisfies this version of EOI: small(i)C & (i~i+1)C small(i+1)C

• When premisses small(i)small(i+1) are judged on their own, they must be true in any coherent context because the “context determined” is {i,i+1}: There is no help element, so (i~i+1)C,so EOI “fires”.

Page 12: Lecture 4: Sorites 2

Veltman 1987 (sketch)

• But when taken together, the premisses “determine” a larger context.

• For example,(99~100)C is judged in C = {0,1,2,…,100}. Plenty of help elements, so (99~100)C doesnot hold, and EOI does not fire, so coherence does not imply that small(100).

Page 13: Lecture 4: Sorites 2

Veltman 1987 (sketch)

• In fact, it is not possible that all premisses of the form (i~i+1)C are true in the context jointly determined by them

(Suppose small(k) & ¬small(k+1). Then C must contain h such that (k<<h) & ¬(k+1 << h). Consequently,(k<<k+1)C and therefore C is incoherent.)

Page 14: Lecture 4: Sorites 2

Veltman 1987 (sketch)

Similar reasoning makes the quantified version of the crucial premiss false in any coherent model

Comments:– A flaw in an otherwise appealing approach? (Similar

to Kamp 81)– The idea of using context may be taken further. For

instance, contexts may be built up from left to right– More importantly, linguistic approaches to context

show that where there is context dependence there is ambiguity (e.g., Graeme Hirst, Computational Linguistics 1996)

Page 15: Lecture 4: Sorites 2

Van Deemter 1992,1996

• In NPs like `a small elephant’, the predicate small is known to be context dependent: context is provided by the noun elephant.

• Discourse Representation Theory (DRT) suggests: context can also be built up through discourse:

x1 . dinosaur(x1)

x2 . whale(x2)

x3 . x3=Jumbo small(Jumbo)

• For concreteness, focus on one way in which small(x) might depend on context. Notation: small(x)A , S(x)A

Page 16: Lecture 4: Sorites 2

Some definitions

small(x)A def |{yεA: y<<x}| < |{yεA: y>>x}|

KA[x] =def {yεA: y<<x}

GA[x] =def {yεA: y>>x}

. . . . . . . . . . x . . . . . . . . . .

KA[x] GA[x]

(Dots represent the elements of the context A)

Page 17: Lecture 4: Sorites 2

We have contextualised both ~ and small

This leads to many possible versions of the

Crucial premiss. In particular

• Plain version:S(x)A & x~y S(y)A{x}

• Sophisticated version:S(x)A & x~yA S(y)A{x}

Page 18: Lecture 4: Sorites 2

What other contextualisations can you

think of? Are they really different? Recall:

• Plain version:S(x)A & x~y S(y)A{x}

• Sophisticated version:S(x)A & x~yA S(y)A{x}

Page 19: Lecture 4: Sorites 2

Plain: S(x)A & x~y S(y)A{x}

Sophisticated: S(x)A & x~yA S(y)A{x}

• S(x) may be relativised to A or to A{x}• x~y may be relativised to nothing, to A, to A{x}

or to A{x,y}• S(y) may be relativised to A or to A{x}, or to

A{y} or to A{x,y}

Each of these options is equivalent to either Plain or Sophisticated. For instance, x~yA x~yA{x,y} . S(y)A S(y)A{x,y} because x~y (in both versions)

Page 20: Lecture 4: Sorites 2

First claim

Plain version is invalid: ¬╞ S(x)A & x~y S(y)A{x}

This is easy to see: let context A={x,y,z}

x << z . . . x ~ y ~ z

KA[x]={}, GA[x]={z}, therefore S(x)A

KA[y]={}, GA[y]={}, therefore ¬S(y)A {x}

Page 21: Lecture 4: Sorites 2

How about the sophisticated version?

Is it valid?

╞ S(x)A & x~yA S(y)A{x}

Page 22: Lecture 4: Sorites 2

Second claim

Sophisticated version is valid: ╞ S(x)A & x~yA S(y)A{x}

Proof: suppose S(a)A & a~bA.Now suppose ¬S(b)A{a} This is equivalent to ¬S(b)A. We therefore haveS(a)A and ¬S(b)A , so either xεA(x<<b & ¬x<<a) or xεA(x>>a & ¬x>>b).But then x is a help element for distinguishing a and b, contradicting a~bA

Page 23: Lecture 4: Sorites 2

• So, the premiss is ambiguous between a plain version that is invalid, and a sophisticated version that is valid

• Since there exists a valid version of the premiss, does this mean that the paradox follows?

• No, for the sophisticated version is not strong enough to support the (paradoxical) conclusion

Page 24: Lecture 4: Sorites 2

Two further claims

1. Plain version supports sorites.

2. Sophisticated version does not support sorites.

1. Plain version supports sorites.

Suppose S(x)A & x~y S(y)A{x}

then sorites goes

S(0)A so S(1)A {0} so S(2)A {0,1} so S(3)A {0,1,2} , etc.

Page 25: Lecture 4: Sorites 2

2. Sophisticated version does not support sorites.

Suppose we use S(x)A & x~yA S(y)A{x}

then the sorites chain breaks off:

S(0)A , so [since (0~1)A]

S(1)A {0} , so [since (1~2)A{0}]

S(2)A {0,1} , so [since (2~3)A{0,1}]

S(3)A {0,1,2} , so [since (3~4)A{0,1,2}] , etc.

Context for judging (i ~ i+1) grows, creating more

and more help elements.

Beyond some point, no suitable y can be found.

Page 26: Lecture 4: Sorites 2

Proof

Suppose i: (o~i)A {0,1,2,…i-1}

Then certainly i: (o~i).

If that is the case, all objects in the domain

are indistinguishable from each other.

Page 27: Lecture 4: Sorites 2

Illustration

• Suppose the argument concerns the heights of people. (S = short)

• Suppose A={ I }={speaker}

• Let size(xi+1)-size(xi)=1mm

• Let JND=10mm

Three types of situations can arise:

Page 28: Lecture 4: Sorites 2

1. We reach x10, which is the first element to be indistinguishable from x0

x0 ~ x9

. . . . .

x0 x1 x9 x10 I

. . . . .

x0 << x10 << I = x400

Page 29: Lecture 4: Sorites 2

2. We reach x10, which is the first element to be indistinguishable from the speaker

x9 << I

. . . .

x1 x9 x10 I

. . . .

x1 ~ x10 ~ I = x19

Page 30: Lecture 4: Sorites 2

3. Upon reaching x10 , both changes coincide

x0 ~ x9 << I .

. . . . .

x0 x1 x9 x10 I

. . . . .

x0 << x10 ~ I = x19

Page 31: Lecture 4: Sorites 2

• In all three situations, two things happen at the same time:– While x9 was indistinguishable from its

predecessor, x10 was not

– While x9 was short with respect to its context, x10 was not

Page 32: Lecture 4: Sorites 2

DRT-ish perspective on sorites:

• I

• x0 S(x0){I}

• x1 S(x1){I} {x0

}

• x2 S(x2){I} {x0

,x1

}

• …

Page 33: Lecture 4: Sorites 2

Properties of this solution

• In terms of Hyde’s classification: this approach depicts sorites as a fallacious argument. (Fallacy of Ambiguity)

• The pattern is as follows1. Some readings of sorites have premisses

all of which are true2. Some readings of sorites are valid3. There are no readings for which both

(a) and (b) are true• See paper in syllabus (section 4)

Page 34: Lecture 4: Sorites 2

Properties of this solution

• Proposal is logically conservative 2-valued standard semantics for connectives

• Some empirical support: By working with formulas like Short(x)A, the context-

dependence of `Short` has been taken into account, in the spirit of context-change semantics

Page 35: Lecture 4: Sorites 2

Approach to meaning of gradable adjectives

• One could argue that the vagueness of gradable adjectives was denied– Fuzzy logic really models vagueness– n-valued logic models it to some

approximation– Kamp 1981 and Veltman 1987 are similar to

our proposal in this respect

• Maybe vagueness cannot be modelled classically?

Page 36: Lecture 4: Sorites 2

Approach to vagueness

• EOI is done justice: “Suppose the objects a and b are observationally

indistinguishable in the respects relevant to P; then either a and b both satisfy P or else neither of them does.”

but observationally indistinguishable has been contextualised, using Goodman/Dummett’s trick

Page 37: Lecture 4: Sorites 2

Some nagging doubts

• Some recent proposals bank on Goodman/Dummet notion of a help element

• Read Dummett 1975: `Wang’s Paradox`. Synthese 30• Strange predictions. Imagine a clock whose hands can

be manually adjusted. Following Goodman any difference between two hands can be made `distinguishable’ by positioning the third hand.

• If experiments were done, one would find that indistinguishability was not always judged in the same way. (JND’s were defined with this in mind!)

• How do these considerations affect our proposals?

Page 38: Lecture 4: Sorites 2

What can we learn from these proposals?

• A computational viewpoint: what should an NLP system understand about vagueness? What pitfalls should be avoided and how?

• Where the system employs vague relations like `similar’, `equivalent’, it should be aware of their non-transitivity. (unlike real equivalence, which is transitive.)

• The system should be able to reason with vagueness, at least in the style of Goyal & Shoham.

• It should not implement EOI, since this leads to paradox: fixed standards would be preferable!

Page 39: Lecture 4: Sorites 2

• When interpreting vague statements, the system should understand that speakers may be using different standards, and that these standards are dependent on context, which tends to lead to ambiguities

• When producing vague statements, it should be aware of the same unclarities, and preferably mirror the way people (e.g., the system’s user) speak.

• Sometimes it may be better to use numbers. In this case, the system has to understand the notion of an approximation. This implies understanding– that a measured difference can be arbitrarily small– that no difference may be measured even though one exists– that `nice’ numbers have a vague reading

• More about these issues in the next lecture!

Page 40: Lecture 4: Sorites 2

Semantics of vague adjectives

• Speakers can vary their standards `by fiat’ (Kennedy 1999, van Deemter 2000)

m1:2cm, m2: 5cm, m3: 7cm, m4: 10cm– `The large mouse’ = m4– `The two large mice’ = {m3,m4}

• Kyburg and Morreau (2000): `Just as a home handyman can fit an adjustable wrench to a nut, we think, a speaker can adjust the extension of a vague expression to suit his needs’

Page 41: Lecture 4: Sorites 2

• How does this wrench work? A stab– at formalisation by Kyburg and Morreau– at implementation by DeVault & Stone (NLU)

and Van Deemter (NLG)– Here: Kyburg and Morreau

• Their example: pigs in a pen. Language:– fat-pig(x)– at-least-as-fat-as(x,y)– fatter(x,y)– individual constants: Arnold, Babe– D(p) (“p is Definitely the case”)

Page 42: Lecture 4: Sorites 2

• One way of thinking about D: D records the things that the wrench cannot change

• D is similar to using three truth values: if Arnold is borderline-fat then

¬D(fat-pig(Arnold)

¬D¬(fat-pig(Arnold)• Abbreviation: I(fat-pig(Arnold)

(Arnold’s being fat is Indeterminate)

Page 43: Lecture 4: Sorites 2

Sketch of model theory

Model M = <Dom, P, ≤, L>Dom = DomainP = set of evaluation points≤ = tree ordering of P @ = `appropriate point’ = bottom of tree

L = interpretation function

L+(R,p)=objects having property R at p L-(R,p)=objects having property R at p

Page 44: Lecture 4: Sorites 2

• Constraint on interpretation L: If p≤q then L+(R,p) L+(R,q)

L-(R,p) L-(R,q)

(moving up the tree, more predications get resolved)

• A point p is complete iff, for all R, L+(R,p) L-(R,p) = Dom

Page 45: Lecture 4: Sorites 2

• Truth at point p is roughly supervaluational :

[R(a)]M,p = true/false iff for all complete q such that p≤q, [a] ε L+(R,q)

[D()]M,p = true iff []M,@

(NB This makes [D()] independent of p)

is true/false/undefined in the model M iff is true/false/undefined at @

Consequence: (at complete and incomplete

points) it may be that

[R(a)]M,p = true while [D(R(a))]M,p = false

Page 46: Lecture 4: Sorites 2

Example 1

• Common ground: I(fat-pig(Arnold))

D(skinny-pig(Babe))

x(pig(x) x =Arnold V x=Babe)

fatter(Arnold,Babe)

• Utterance: s = `the fat pig won the prize’

• Entailed: fat-pig(Arnold),¬fat-pig(Babe)so, `the fat pig` = Arnold

Page 47: Lecture 4: Sorites 2

Reasoning:

• s implies !x(fat-pig(x)

• Common Ground says D(skinny-pig(Babe)), so ¬fat_pig(Babe)

• Since Arnold is the only other pig, it follows that fat-pig(Arnold)

Page 48: Lecture 4: Sorites 2

Example 2

• Common ground: I(fat-pig(Arnold))

I(fat-pig(Babe))

x(pig(x) x =Arnold V x=Babe)

fatter(Arnold,Babe)

• Utterance: s = `the fat pig won the prize’

• Entailed: fat-pig(Arnold),¬fat-pig(Babe)so, the fat pig = Arnold

Page 49: Lecture 4: Sorites 2

Example 3: revising the common ground

fat-pig(Arnold)

fat-pig(Babe)

x(pig(x) x = Arnold V x = Babe)

fatter(Arnold,Babe)

s = `the pig that is not fat won the prize’

Entailed: fat-pig(Arnold),¬fat-pig(Babe)

so, the pig that is not fat = Babe

Page 50: Lecture 4: Sorites 2

Kyburg & Morreau (summing up)

• Intuitions about accommodation of word meaning seem spot on. Some aspects of `dynamics’ are captured

• Formal apparatus: some questions, e.g., how does the tree arise? E.g., does it incorporate laws saying large-pig(x) & larger(y,x) large-pig(y) ?

• Maybe the wrench is too adaptable. – After calling {Arnold,Babe} `the large pigs’,

can we call {Arnold} `the large pig’ ?– Can statements like fatter(Arnold,Babe)

also be revised?

• Might this be embedded into a general theory of information update?

Page 51: Lecture 4: Sorites 2

• Ideas inspired by Kennedy / Kyburg & Morreau have been implemented in NLU by DeVault & Stone (syllabus)– They built a drawing application, where NLU system interprets instructions in context– `(fill) the large circle’ =

(fill) the largest of the visible circles– Result (e.g.) [large(a)]=1; all others are small– until a new description forces a revision

(e.g., `delete this circle` (?)) • Applications to NLG (to follow)