Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ....

23
Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008

Transcript of Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ....

Page 1: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Computing Word-Pair Antonymy

*Saif Mohammad*Bonnie Dorr

φGraeme Hirst

*Univ. of MarylandφUniv. of Toronto

EMNLP 2008

Page 2: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Introduction

• Antonymy: pair of semantically contrasting words.

• Ex: Strongly antonymous: HotCold

Semantically contrasting: EnemyFanNot antonymous: PenguinClown

Page 3: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Usage

• Detecting contradictions• Detecting humor• Automatic creation of thesaurus

Page 4: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Problem Definition

• Given a thesaurus, find out the antonymous category pairs.

• Assign the degree of antonymy to each pair of antonymous categories.

Page 5: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Hypothesis(1)

• The Co-occurrence Hypothesis of Antonyms– Antonymous word pairs occur together much

more often than other word pairs.

Page 6: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Hypothesis(1)

• Empirical proof:– 1,000 antonymous pairs from Wordnet– 1,000 randomly generated word pairs– Use BNC as corpus, set window size 5.– Calculate the MI for each word pairs and average

itAverage Standard deviation

Antonymous pair 0.94 2.27

Random pair 0.01 0.37

Page 7: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Hypothesis(2)

• The Distributional Hypothesis of Antonyms– Antonyms occur in similar contexts more often

than non-antonymous words– Ex work: activity of doing job

play: activity of relaxation

Page 8: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Hypothesis(2)

• Empirical proof:– Use the same set of word pairs in hypothesis(1)– Calculate the distributional distance between their

categories

Average Standard deviation

Antonymous pair 0.30 0.23

Random pair 0.23 0.11

Page 9: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Distributional Distancebetween Two Thesaurus Categories

c1,c2: thesaurus categoryI(x,y):pointwise mutual information between x and yT(c):the set of all words w such that I(c,w)>0

Page 10: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• Determine pairs of thesaurus categories that are contrasting in meaning

• Use the co-occurrence and distributional hypotheses to determine the degree of antonymy of word pairs

Page 11: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method•16 affix rules were applied to Macquarie Thesaurus •2,734 word pairs were generated as a seed set.

•Exceptions: sectXinsect• Relatively few

Page 12: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• 10,807 pairs of semantically contrasting word pairs from WordNet

Page 13: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• If any word in thesaurus category C1 is antonymous to any word in category C2 as per a seed antonym pair, then the two categories are marked as contrasting.

• If no word in C1 is antonymous to any word in C2, then the categories are considered not contrasting

Page 14: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• Degree of antonymy----category level– By distributional hypothesis of antonyms, we

claim that the degree of antonymy between two contrasting thesaurus categories is directly proportional to the distributional closeness of the two concepts

Page 15: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• Degree of antonymy----word level– target words belong to the same thesaurus

paragraphs as any of the seed antonyms linking the two contrasting categories highly antonymous

– target words do not both belong to the same paragraphs as a seed antonym pair, but occur in contrasting categories medium antonymous

– target words with low tendency to co-occur lowly antonymous

Page 16: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Method

• Adjacency Heuristic– Most thesauri are ordered such that contrasting

categories tend to be adjacent

Page 17: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Evaluation

• 1,112 Closest-opposite questions designed to prepare students for GRE(Graduate Record Examination)– 162 questions as the development set– 950 questions as the test set

Page 18: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Evaluation

• Closest-opposite questions– Ex:

adulterate: a. renounce b. forbid c. purify d. criticize e. correct

Page 19: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Evaluation

• Closest-opposite questions– Ex:

adulterate: a. renounce b. forbid c. purify d. criticize e. correct

摻雜的

純淨的 批評

正確

禁止聲明放棄

Page 20: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Evaluation

Page 21: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Discussion

• The automatic approach does indeed mimic human intuitions of antonymy.

• In languages without a wordnet, substantial accuracies may be achieved.

• Wordnet and affix-generated seed are complementary.

Page 22: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Conclusion

• Proposed an empirical approach to antonymy that combines corpus co-occurrence statistics with the structure of a thesaurus.

• The system can identify the degree of antonymy between word pairs.

• An empirical proof that antonym pairs tend to be used in similar contexts.

Page 23: Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.

Thanks