Lecture 4: Sorites 2

Post on 12-Jan-2016

41 views 0 download

description

Lecture 4: Sorites 2. Finish Kamp 1981 Explain other context-based approaches (Kyburg & Morreau: fitting words). Context for sorites: Kamp 1981. Context C is a deductively closed set of formulas, e.g., a. {small(1), small(2), ¬small(3)} b. {small(1), small(2), ¬small(4)} - PowerPoint PPT Presentation

Transcript of Lecture 4: Sorites 2

Lecture 4: Sorites 2

Finish Kamp 1981Explain other context-based approaches

(Kyburg & Morreau: fitting words)

Context for sorites: Kamp 1981

• Context C is a deductively closed set of formulas, e.g.,

a. {small(1), small(2), ¬small(3)}b. {small(1), small(2), ¬small(4)}c. {small(1), small(6), ¬small(4)}

• C is incoherent xy(small(x) εC & ¬small(y) εC & (x~y V y<<x))

• Suppose JND=2. Thena is incoherent (2~3)b is coherent (not 2~4)c is incoherent (4 << 6)

3-valued truth definition: (Here only truth)

• [small(a)]C=1 iff b(small(b) εC & (b~a or a<<b))• [pq]C=1 iff [p]C=0 or [q]C{p}=1 • [p&q]C=1 iff [p]C=1 and [q]C=1 and

C {p,q} is coherent

Let C contain small(0) and ¬small(1000). Then

• For every i, we have [small(i) small(i+1)]C=1 because [small(i+1)]C{small(i)}=1• But let p1= small(0) small(1), p2= small(1) small(2), etc.

Then [p1&…&p1000]C≠1, since C {p1,…&p1000} is not coherent

Kamp 1981

universal quantification: like conjunction

[x(x)]C=1 iff

[(a)]C=1 for all constants a &

{(a)} is coherent

Example: [i(small(i)small(i+1))]=1 iff

[small(1)small(2)]C=1 & …. &

the set of all the instantiations is coherent

Analysis of paradox (Kamp 81)

Every instance of the crucial premiss

Is true, but– Not their conjunction– Not the quantified version

Is the paradox valid?

The argument (n=1000)

small(0)

small(0)small(1), so small(1)

small(1)small(2), so small(2)

small(999)small(1000), so small(1000)

Contextualised version of validity: If premisses are true then conclusion is true in the context that collects all premisses.

Argument is valid. Key: If [small(i)]C=1 and

[small(i)small(i+1)]C=1 then

[small(i+1)]C{small(i)}=1

Kamp 1981: consequences

• The argument is deemed valid

• Each instantiation of the crucial premiss is true, but their conjunction is not

• Likewise, quantified version is false• Drawback: in many ways, this is as

nonclassical as earlier accounts.(Compare Kamp’s critique of these)

Veltman 1987

• Variant of Kamp’s approach, based on an idea of Nelson Goodman (further developed by Michael Dummett)

• Key idea: whether two objects can be distinguished depends on other objects in the context

a ~ b ~ c | | a << c

If c is absent then a and b are indistinguishable.But with c present, a difference between a and b can be inferred: c is called a help element.

Notations

(x << y)A

x << y V hεA(h<<y & ¬ h<<x) VhεA(h>>x & ¬ h>>y)

(x ~ y)A ¬(x << y)A & ¬(y << x)A

Veltman 1987 (sketch)

• Proposed to judge the truth of a discourserelative to a context determined by the discourse

• A context is coherent iff it satisfies this version of EOI: small(i)C & (i~i+1)C small(i+1)C

• When premisses small(i)small(i+1) are judged on their own, they must be true in any coherent context because the “context determined” is {i,i+1}: There is no help element, so (i~i+1)C,so EOI “fires”.

Veltman 1987 (sketch)

• But when taken together, the premisses “determine” a larger context.

• For example,(99~100)C is judged in C = {0,1,2,…,100}. Plenty of help elements, so (99~100)C doesnot hold, and EOI does not fire, so coherence does not imply that small(100).

Veltman 1987 (sketch)

• In fact, it is not possible that all premisses of the form (i~i+1)C are true in the context jointly determined by them

(Suppose small(k) & ¬small(k+1). Then C must contain h such that (k<<h) & ¬(k+1 << h). Consequently,(k<<k+1)C and therefore C is incoherent.)

Veltman 1987 (sketch)

Similar reasoning makes the quantified version of the crucial premiss false in any coherent model

Comments:– A flaw in an otherwise appealing approach? (Similar

to Kamp 81)– The idea of using context may be taken further. For

instance, contexts may be built up from left to right– More importantly, linguistic approaches to context

show that where there is context dependence there is ambiguity (e.g., Graeme Hirst, Computational Linguistics 1996)

Van Deemter 1992,1996

• In NPs like `a small elephant’, the predicate small is known to be context dependent: context is provided by the noun elephant.

• Discourse Representation Theory (DRT) suggests: context can also be built up through discourse:

x1 . dinosaur(x1)

x2 . whale(x2)

x3 . x3=Jumbo small(Jumbo)

• For concreteness, focus on one way in which small(x) might depend on context. Notation: small(x)A , S(x)A

Some definitions

small(x)A def |{yεA: y<<x}| < |{yεA: y>>x}|

KA[x] =def {yεA: y<<x}

GA[x] =def {yεA: y>>x}

. . . . . . . . . . x . . . . . . . . . .

KA[x] GA[x]

(Dots represent the elements of the context A)

We have contextualised both ~ and small

This leads to many possible versions of the

Crucial premiss. In particular

• Plain version:S(x)A & x~y S(y)A{x}

• Sophisticated version:S(x)A & x~yA S(y)A{x}

What other contextualisations can you

think of? Are they really different? Recall:

• Plain version:S(x)A & x~y S(y)A{x}

• Sophisticated version:S(x)A & x~yA S(y)A{x}

Plain: S(x)A & x~y S(y)A{x}

Sophisticated: S(x)A & x~yA S(y)A{x}

• S(x) may be relativised to A or to A{x}• x~y may be relativised to nothing, to A, to A{x}

or to A{x,y}• S(y) may be relativised to A or to A{x}, or to

A{y} or to A{x,y}

Each of these options is equivalent to either Plain or Sophisticated. For instance, x~yA x~yA{x,y} . S(y)A S(y)A{x,y} because x~y (in both versions)

First claim

Plain version is invalid: ¬╞ S(x)A & x~y S(y)A{x}

This is easy to see: let context A={x,y,z}

x << z . . . x ~ y ~ z

KA[x]={}, GA[x]={z}, therefore S(x)A

KA[y]={}, GA[y]={}, therefore ¬S(y)A {x}

How about the sophisticated version?

Is it valid?

╞ S(x)A & x~yA S(y)A{x}

Second claim

Sophisticated version is valid: ╞ S(x)A & x~yA S(y)A{x}

Proof: suppose S(a)A & a~bA.Now suppose ¬S(b)A{a} This is equivalent to ¬S(b)A. We therefore haveS(a)A and ¬S(b)A , so either xεA(x<<b & ¬x<<a) or xεA(x>>a & ¬x>>b).But then x is a help element for distinguishing a and b, contradicting a~bA

• So, the premiss is ambiguous between a plain version that is invalid, and a sophisticated version that is valid

• Since there exists a valid version of the premiss, does this mean that the paradox follows?

• No, for the sophisticated version is not strong enough to support the (paradoxical) conclusion

Two further claims

1. Plain version supports sorites.

2. Sophisticated version does not support sorites.

1. Plain version supports sorites.

Suppose S(x)A & x~y S(y)A{x}

then sorites goes

S(0)A so S(1)A {0} so S(2)A {0,1} so S(3)A {0,1,2} , etc.

2. Sophisticated version does not support sorites.

Suppose we use S(x)A & x~yA S(y)A{x}

then the sorites chain breaks off:

S(0)A , so [since (0~1)A]

S(1)A {0} , so [since (1~2)A{0}]

S(2)A {0,1} , so [since (2~3)A{0,1}]

S(3)A {0,1,2} , so [since (3~4)A{0,1,2}] , etc.

Context for judging (i ~ i+1) grows, creating more

and more help elements.

Beyond some point, no suitable y can be found.

Proof

Suppose i: (o~i)A {0,1,2,…i-1}

Then certainly i: (o~i).

If that is the case, all objects in the domain

are indistinguishable from each other.

Illustration

• Suppose the argument concerns the heights of people. (S = short)

• Suppose A={ I }={speaker}

• Let size(xi+1)-size(xi)=1mm

• Let JND=10mm

Three types of situations can arise:

1. We reach x10, which is the first element to be indistinguishable from x0

x0 ~ x9

. . . . .

x0 x1 x9 x10 I

. . . . .

x0 << x10 << I = x400

2. We reach x10, which is the first element to be indistinguishable from the speaker

x9 << I

. . . .

x1 x9 x10 I

. . . .

x1 ~ x10 ~ I = x19

3. Upon reaching x10 , both changes coincide

x0 ~ x9 << I .

. . . . .

x0 x1 x9 x10 I

. . . . .

x0 << x10 ~ I = x19

• In all three situations, two things happen at the same time:– While x9 was indistinguishable from its

predecessor, x10 was not

– While x9 was short with respect to its context, x10 was not

DRT-ish perspective on sorites:

• I

• x0 S(x0){I}

• x1 S(x1){I} {x0

}

• x2 S(x2){I} {x0

,x1

}

• …

Properties of this solution

• In terms of Hyde’s classification: this approach depicts sorites as a fallacious argument. (Fallacy of Ambiguity)

• The pattern is as follows1. Some readings of sorites have premisses

all of which are true2. Some readings of sorites are valid3. There are no readings for which both

(a) and (b) are true• See paper in syllabus (section 4)

Properties of this solution

• Proposal is logically conservative 2-valued standard semantics for connectives

• Some empirical support: By working with formulas like Short(x)A, the context-

dependence of `Short` has been taken into account, in the spirit of context-change semantics

Approach to meaning of gradable adjectives

• One could argue that the vagueness of gradable adjectives was denied– Fuzzy logic really models vagueness– n-valued logic models it to some

approximation– Kamp 1981 and Veltman 1987 are similar to

our proposal in this respect

• Maybe vagueness cannot be modelled classically?

Approach to vagueness

• EOI is done justice: “Suppose the objects a and b are observationally

indistinguishable in the respects relevant to P; then either a and b both satisfy P or else neither of them does.”

but observationally indistinguishable has been contextualised, using Goodman/Dummett’s trick

Some nagging doubts

• Some recent proposals bank on Goodman/Dummet notion of a help element

• Read Dummett 1975: `Wang’s Paradox`. Synthese 30• Strange predictions. Imagine a clock whose hands can

be manually adjusted. Following Goodman any difference between two hands can be made `distinguishable’ by positioning the third hand.

• If experiments were done, one would find that indistinguishability was not always judged in the same way. (JND’s were defined with this in mind!)

• How do these considerations affect our proposals?

What can we learn from these proposals?

• A computational viewpoint: what should an NLP system understand about vagueness? What pitfalls should be avoided and how?

• Where the system employs vague relations like `similar’, `equivalent’, it should be aware of their non-transitivity. (unlike real equivalence, which is transitive.)

• The system should be able to reason with vagueness, at least in the style of Goyal & Shoham.

• It should not implement EOI, since this leads to paradox: fixed standards would be preferable!

• When interpreting vague statements, the system should understand that speakers may be using different standards, and that these standards are dependent on context, which tends to lead to ambiguities

• When producing vague statements, it should be aware of the same unclarities, and preferably mirror the way people (e.g., the system’s user) speak.

• Sometimes it may be better to use numbers. In this case, the system has to understand the notion of an approximation. This implies understanding– that a measured difference can be arbitrarily small– that no difference may be measured even though one exists– that `nice’ numbers have a vague reading

• More about these issues in the next lecture!

Semantics of vague adjectives

• Speakers can vary their standards `by fiat’ (Kennedy 1999, van Deemter 2000)

m1:2cm, m2: 5cm, m3: 7cm, m4: 10cm– `The large mouse’ = m4– `The two large mice’ = {m3,m4}

• Kyburg and Morreau (2000): `Just as a home handyman can fit an adjustable wrench to a nut, we think, a speaker can adjust the extension of a vague expression to suit his needs’

• How does this wrench work? A stab– at formalisation by Kyburg and Morreau– at implementation by DeVault & Stone (NLU)

and Van Deemter (NLG)– Here: Kyburg and Morreau

• Their example: pigs in a pen. Language:– fat-pig(x)– at-least-as-fat-as(x,y)– fatter(x,y)– individual constants: Arnold, Babe– D(p) (“p is Definitely the case”)

• One way of thinking about D: D records the things that the wrench cannot change

• D is similar to using three truth values: if Arnold is borderline-fat then

¬D(fat-pig(Arnold)

¬D¬(fat-pig(Arnold)• Abbreviation: I(fat-pig(Arnold)

(Arnold’s being fat is Indeterminate)

Sketch of model theory

Model M = <Dom, P, ≤, L>Dom = DomainP = set of evaluation points≤ = tree ordering of P @ = `appropriate point’ = bottom of tree

L = interpretation function

L+(R,p)=objects having property R at p L-(R,p)=objects having property R at p

• Constraint on interpretation L: If p≤q then L+(R,p) L+(R,q)

L-(R,p) L-(R,q)

(moving up the tree, more predications get resolved)

• A point p is complete iff, for all R, L+(R,p) L-(R,p) = Dom

• Truth at point p is roughly supervaluational :

[R(a)]M,p = true/false iff for all complete q such that p≤q, [a] ε L+(R,q)

[D()]M,p = true iff []M,@

(NB This makes [D()] independent of p)

is true/false/undefined in the model M iff is true/false/undefined at @

Consequence: (at complete and incomplete

points) it may be that

[R(a)]M,p = true while [D(R(a))]M,p = false

Example 1

• Common ground: I(fat-pig(Arnold))

D(skinny-pig(Babe))

x(pig(x) x =Arnold V x=Babe)

fatter(Arnold,Babe)

• Utterance: s = `the fat pig won the prize’

• Entailed: fat-pig(Arnold),¬fat-pig(Babe)so, `the fat pig` = Arnold

Reasoning:

• s implies !x(fat-pig(x)

• Common Ground says D(skinny-pig(Babe)), so ¬fat_pig(Babe)

• Since Arnold is the only other pig, it follows that fat-pig(Arnold)

Example 2

• Common ground: I(fat-pig(Arnold))

I(fat-pig(Babe))

x(pig(x) x =Arnold V x=Babe)

fatter(Arnold,Babe)

• Utterance: s = `the fat pig won the prize’

• Entailed: fat-pig(Arnold),¬fat-pig(Babe)so, the fat pig = Arnold

Example 3: revising the common ground

fat-pig(Arnold)

fat-pig(Babe)

x(pig(x) x = Arnold V x = Babe)

fatter(Arnold,Babe)

s = `the pig that is not fat won the prize’

Entailed: fat-pig(Arnold),¬fat-pig(Babe)

so, the pig that is not fat = Babe

Kyburg & Morreau (summing up)

• Intuitions about accommodation of word meaning seem spot on. Some aspects of `dynamics’ are captured

• Formal apparatus: some questions, e.g., how does the tree arise? E.g., does it incorporate laws saying large-pig(x) & larger(y,x) large-pig(y) ?

• Maybe the wrench is too adaptable. – After calling {Arnold,Babe} `the large pigs’,

can we call {Arnold} `the large pig’ ?– Can statements like fatter(Arnold,Babe)

also be revised?

• Might this be embedded into a general theory of information update?

• Ideas inspired by Kennedy / Kyburg & Morreau have been implemented in NLU by DeVault & Stone (syllabus)– They built a drawing application, where NLU system interprets instructions in context– `(fill) the large circle’ =

(fill) the largest of the visible circles– Result (e.g.) [large(a)]=1; all others are small– until a new description forces a revision

(e.g., `delete this circle` (?)) • Applications to NLG (to follow)