CS 136a Lecture 8 Yet More Language Modelingcs136a/CS136a_Slides/CS136a... · 2017-10-24 · +topic...

CS 136a Lecture 8 Yet More Language Modeling

September16, 2016 Professor Meteer

Thanks to Dan Jurafsky & Josh Goodman for many of these slides π

+ Overview (from Microsoft Tutorial)

n Caching n Skipping

n Clustering

n Sentence-mixture models

n Structured language models

n Tools

n More on the author, Josh Goodman http://research.microsoft.com/en-us/um/people/

joshuago/icmldescription.htm

+ 3 Caching

n  If you say something, you are likely to say it again later.

P(z | history = λPsmooth( z | xy) + (1 – λ)Pcache(z | history)

n  Interpolate trigram with cache

n Trigram caches get almost twice the improvement as unigram caches

Pcache (z | history) = C (z ∈ history) length(history)

+ 4 Caching: Variations

n N-gram caches:

n Conditional n-gram cache: use n-gram cache only if xy ∈ history

n Remove function-words from cache, like “the”, “to”

Pcache (z | history) = C (xyz ∈ history ) C (xy ∈ history )

+ Skipping n Capturing phrasal elements

n  Show John a good time à Show XXX a good time

n Standard 5 gram: P(z|…rstuvwxy) ≈ P(z|vwxy)

n Why not P(z|v_xy) – “skipping” n-gram – skips value of 3-back word

n Example: “P(time|show John a good)” ->

P(time | show ____ a good)

n P(…rstuvwxy) ≈

λP(z|vwxy) + µP(z|vw_y) + (1-λ-µ)P(z|v_xy)

+ Clustering

n CLUSTERING = CLASSES (same thing)

n What is P(“Tuesday | party on”)

n Similar to P(“Monday | party on”)

n Similar to P(“Tuesday | celebration on”)

n Put words in clusters: n  WEEKDAY = Sunday, Monday, Tuesday, … n  EVENT=party, celebration, birthday, …

+ Clustering overview

n Major topic, useful in many fields

n Kinds of clustering n  Predictive clustering n  Conditional clustering n  IBM-style clustering

n How to get clusters n  Be clever or it takes forever!

+ Predictive clustering

n Let “z” be a word, “Z” be its cluster

n One cluster per word: hard clustering n  WEEKDAY = Sunday, Monday, Tuesday, … n  MONTH = January, February, April, May, June, …

n P(z|xy) = P(Z|xy) × P(z|xyZ)

n P(Tuesday | party on) = P(WEEKDAY | party on) × P(Tuesday | party on WEEKDAY)

n Psmooth(z|xy) ≈ Psmooth (Z|xy) × Psmooth (z|xyZ)

+ Predictive clustering example

n Find P(Tuesday | party on) Psmooth (WEEKDAY | party on) × Psmooth (Tuesday | party on WEEKDAY) C(party on Tuesday) = 0 C(party on Wednesday) = 10 C(arriving on Tuesday) = 10 C(on Tuesday) = 100

Psmooth (WEEKDAY | party on) is high

Psmooth (Tuesday | party on WEEKDAY) backs off to Psmooth (Tuesday | on WEEKDAY)

+ Conditional clustering

P(z|xy) = P(z|xXyY)

P(Tuesday | party on) =

P(Tuesday | party EVENT on PREPOSITION)

Condition off classes in the context

+ Conditional clustering example

λP (Tuesday | party EVENT on PREPOSITION) +

µ P(Tuesday | EVENT on PREPOSITION) +

δP(Tuesday | on PREPOSITION) +

γP(Tuesday | PREPOSITION) +

(1- λ - µ - δ- γ) P(Tuesday

Eliminating redundancy

λP (Tuesday | party on) + µ P(Tuesday | EVENT on) + δP(Tuesday | on) + γP(Tuesday | PREPOSITION) + (1- λ - µ - δ- γ) P(Tuesday) =

+ Combined clustering

n P(z|xy) ≈ Psmooth(Z|xXyY) × Psmooth(z|xXyYZ)

P(Tuesday| party on) ≈ Psmooth(WEEKDAY | party EVENT on PREPOSITION) ×

Psmooth(Tuesday | party EVENT on PREPOSITION WEEKDAY)

n Much larger than unclustered, somewhat lower perplexity.

+ IBM Clustering P (z|xy) ≈ Psmooth(Z|XY) × P(z|Z)

P(WEEKDAY|EVENT PREPOSITION)

× P(Tuesday | WEEKDAY)

n Small, very smooth, mediocre perplexity

P (z|xy) ≈

λ Psmooth (z|xy) + (1- λ )Psmooth(Z|XY) × P(z|Z)

n Bigger, better than no clusters, better than combined clustering.

n  Improvement: use P(z|XYZ) instead of P(z|Z)

+ Clustering by Position

n “A” and “AN”: same cluster or different cluster?

n Same cluster for predictive clustering

n Different clusters for conditional clustering

n Small improvement by using different clusters for conditional and predictive

+ Clustering: how to get them

n Build them by hand n  Works ok when almost no data

n Part of Speech (POS) tags n  Tends not to work as well as automatic

n Automatic Clustering n  Swap words between clusters to minimize perplexity

+ Clustering: automatic

n Minimize perplexity of P(z|Y) Mathematical tricks speed it up

Use top-down splitting,

not bottom up merging!

+ Two actual WSJ classes n  MONDAYS

n  FRIDAYS

n  THURSDAY

n  MONDAY

n  EURODOLLARS

n  SATURDAY

n  WEDNESDAY

n  FRIDAY

n  TENTERHOOKS

n  TUESDAY

n  SUNDAY

n  CONDITION

n  PARTY

n  FESCO

n  CULT

n  NILSON

n  PETA

n  CAMPAIGN

n  WESTPAC

n  FORCE

n  CONRAN

n  DEPARTMENT

n  PENH

n  GUILD

+ Sentence Mixture Models

n Lots of different sentence types: n  Numbers (The Dow rose one hundred seventy three points) n  Quotations (Officials said “quote we deny all wrong doing ”quote) n  Mergers (AOL and Time Warner, in an attempt to control the media

and the internet, will merge)

n Model each sentence type separately

+ Sentence Mixture Models

n Roll a die to pick sentence type, sk

with probability λk

n Probability of sentence, given sk

n Probability of sentence across types:

λk P(wi |wi−2wi−1sk )i=1

∏k=1

∑€

P(wi |wi−2wi−1sk )i=1

+ Sentence Model Smoothing

n Each topic model is smoothed with overall model.

n Sentence mixture model is smoothed with overall model (sentence type 0).

∑ ∏= = −−

−−⎥⎦

⎤⎢⎣

i iiik

kiiikk wwwP

0 1 12

)|()1()|(

+ Sentence Mixture Results

Sentence mixture models (10,000,000 training)

108110112114116118120122124126

0 1 2 3 4 5 6 7Log-2 Number Mixtures

Sentence mixtureBaseline

13% reduction

+ Sentence Clustering

n Same algorithm as word clustering

n Assign each sentence to a type, sk

n Minimize perplexity of P(z|sk ) instead of P(z|Y)

+ Topic Examples - 0 (Mergers and acquisitions) n  JOHN BLAIR & COMPANY IS CLOSE TO AN AGREEMENT TO SELL ITS T. V.

STATION ADVERTISING REPRESENTATION OPERATION AND PROGRAM PRODUCTION UNIT TO AN INVESTOR GROUP LED BY JAMES H. ROSENFIELD ,COMMA A FORMER C. B. S. INCORPORATED EXECUTIVE ,COMMA INDUSTRY SOURCES SAID .PERIOD

n  INDUSTRY SOURCES PUT THE VALUE OF THE PROPOSED ACQUISITION AT MORE THAN ONE HUNDRED MILLION DOLLARS .PERIOD

n  JOHN BLAIR WAS ACQUIRED LAST YEAR BY RELIANCE CAPITAL GROUP INCORPORATED ,COMMA WHICH HAS BEEN DIVESTING ITSELF OF JOHN BLAIR'S MAJOR ASSETS .PERIOD

n  JOHN BLAIR REPRESENTS ABOUT ONE HUNDRED THIRTY LOCAL TELEVISION STATIONS IN THE PLACEMENT OF NATIONAL AND OTHER ADVERTISING .PERIOD

n  MR. ROSENFIELD STEPPED DOWN AS A SENIOR EXECUTIVE VICE PRESIDENT OF C. B. S. BROADCASTING IN DECEMBER NINETEEN EIGHTY FIVE UNDER A C. B. S. EARLY RETIREMENT PROGRAM .PERIOD

+ Topic Examples - 1 (production, promotions, commas)

n  MR. DION ,COMMA EXPLAINING THE RECENT INCREASE IN THE STOCK PRICE ,COMMA SAID ,COMMA "DOUBLE-QUOTE OBVIOUSLY ,COMMA IT WOULD BE VERY ATTRACTIVE TO OUR COMPANY TO WORK WITH THESE PEOPLE .PERIOD

n  BOTH MR. BRONFMAN AND MR. SIMON WILL REPORT TO DAVID G. SACKS ,COMMA PRESIDENT AND CHIEF OPERATING OFFICER OF SEAGRAM .PERIOD

n  JOHN A. KROL WAS NAMED GROUP VICE PRESIDENT ,COMMA AGRICULTURE PRODUCTS DEPARTMENT ,COMMA OF THIS DIVERSIFIED CHEMICALS COMPANY ,COMMA SUCCEEDING DALE E. WOLF ,COMMA WHO WILL RETIRE MAY FIRST .PERIOD

n  MR. KROL WAS FORMERLY VICE PRESIDENT IN THE AGRICULTURE PRODUCTS DEPARTMENT .PERIOD

+ Topic Examples - 2 (Numbers)

n  SOUTH KOREA POSTED A SURPLUS ON ITS CURRENT ACCOUNT OF FOUR HUNDRED NINETEEN MILLION DOLLARS IN FEBRUARY ,COMMA IN CONTRAST TO A DEFICIT OF ONE HUNDRED TWELVE MILLION DOLLARS A YEAR EARLIER ,COMMA THE GOVERNMENT SAID .PERIOD

n  THE CURRENT ACCOUNT COMPRISES TRADE IN GOODS AND SERVICES AND SOME UNILATERAL TRANSFERS .PERIOD

n  COMMERCIAL -HYPHEN VEHICLE SALES IN ITALY ROSE ELEVEN .POINT FOUR %PERCENT IN FEBRUARY FROM A YEAR EARLIER ,COMMA TO EIGHT THOUSAND ,COMMA EIGHT HUNDRED FORTY EIGHT UNITS ,COMMA ACCORDING TO PROVISIONAL FIGURES FROM THE ITALIAN ASSOCIATION OF AUTO MAKERS .PERIOD

n  INDUSTRIAL PRODUCTION IN ITALY DECLINED THREE .POINT FOUR %PERCENT IN JANUARY FROM A YEAR EARLIER ,COMMA THE GOVERNMENT SAID .PERIOD

+ Topic Examples – 3 (quotations)

n  NEITHER MR. ROSENFIELD NOR OFFICIALS OF JOHN BLAIR COULD BE REACHED FOR COMMENT .PERIOD

n  THE AGENCY SAID THERE IS "DOUBLE-QUOTE SOME INDICATION OF AN UPTURN "DOUBLE-QUOTE IN THE RECENT IRREGULAR PATTERN OF SHIPMENTS ,COMMA FOLLOWING THE GENERALLY DOWNWARD TREND RECORDED DURING THE FIRST HALF OF NINETEEN EIGHTY SIX .PERIOD

n  THE COMPANY SAID IT ISN'T AWARE OF ANY TAKEOVER INTEREST .PERIOD

n  THE SALE INCLUDES THE RIGHTS TO GERMAINE MONTEIL IN NORTH AND SOUTH AMERICA AND IN THE FAR EAST ,COMMA AS WELL AS THE WORLDWIDE RIGHTS TO THE DIANE VON FURSTENBERG COSMETICS AND FRAGRANCE LINES AND U. S. DISTRIBUTION RIGHTS TO LANCASTER BEAUTY PRODUCTS .PERIOD

n  BUT THE COMPANY WOULDN'T ELABORATE .PERIOD

+ Structured Language Model

“The contract ended with a loss of 7 cents after”

+ How to get structured data?

n Use a Treebank (a collection of sentences with structure hand annotated) like Wall Street Journal, Penn Tree Bank.

n Problem: need a treebank.

n Or – use a treebank (WSJ) to train a parser; then parse new training data (e.g. Broadcast News)

n Re-estimate parameters to get lower perplexity models.

+ Structured Language Models

n Use structure of language to detect long distance information

n Promising results

n But: time consuming; language is right branching; 5-grams, skipping, capture similar information.

+ Some Experiments n Goodman re-implemented all techniques

n Trained on 260,000,000 words of WSJ

n Optimize parameters on heldout set

n Test on separate test section

n Some combinations extremely time-consuming (days of CPU time) n  Don’t try this at home, or in anything you want to ship

n Rescored N-best lists to get results n  Maximum possible improvement from 10% word error rate

absolute to 5%

+ Overall Results: Perplexity 31

Overall Results: Word Accuracy

+ Conclusions

n Use trigram models

n Use any reasonable smoothing algorithm (Katz, Kneser-Ney)

n Use caching if you have correction information.

n Clustering, sentence mixtures, skipping not usually worth effort.

+ Tools: CMU Language Modeling Toolkit

n Can handle bigram, trigrams, more

n Can handle different smoothing schemes

n Many separate tools – output of one tool is input to next: easy to use

n Free for research purposes

n http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html

+ Using the CMU LM Tools 35

+ Tools: SRI Language Modeling Toolkit

n More powerful than CMU toolkit

n Can handles clusters, lattices, n-best lists, hidden tags

n Free for research use

n http://www.speech.sri.com/projects/srilm

+ IRSTLM

n (put in the link)

n Looks like its mostly addressing the problem of really huge LMs

Thanks to Dan Jurafsky for these slides

+ Reality: The LM is only a good as the data

n Text normalization n  What about “$3,100,000” à convert to “Three million one

hundred thousand dollars”, etc. n  Need to do this for dates, numbers, maybe abbreviations.

n Some text-normalization tools come with Wall Street Journal corpus, from LDC (Linguistic Data Consortium)

n Not much available

n Write your own (use Perl!)

CS 136a Lecture 8 Yet More Language Modelingcs136a/CS136a_Slides/CS136a... · 2017-10-24 · +topic...

Documents

Transcript of CS 136a Lecture 8 Yet More Language Modelingcs136a/CS136a_Slides/CS136a... · 2017-10-24 · +topic...

20042003 Δ % 2004/2003 Exportação / Exports 96.47573.08432,0 Importação / Imports 62.77948.29130,0 Saldo / Surplus 33.69624.79335,9 Corrente de Comércio.

A one-page tutorial on coherent ν-N scattering Fundamental ...conferences.fnal.gov/dmwksp/Talks/collar.pdf · • However, not yet measured… detector technology has been missing.

Ballast Water Management Latest Developments - ΕΕΝΜΑ · PDF fileBallast Water Management Latest Developments ... A.868 (20) will remain ... IMO has not yet established guidance

RELATO HIPERTEXTUAL KUKUÍSTA Hespérides: manantiales …aleph1888.github.io/ataraxia_archive/hesperides_1x03.pdfSemana inicial: Julio 01 de julio 2016 Posted on mayo 1.Instrucción

ASTR-1010 Planetary Astronomy Day - 18. Announcements Smartworks Chapters 4: Due Monday, March 1. Smartworks Chapter 5 is also posted Exam 2 will cover.

lectures 12 and 13 are posted homework 8 solutions are ...€¢ lectures 12 and 13 are posted • homework 8 solutions are posted • homework 9 is due March 30 • reading for this

Digital Signal Processing using MATLAB 3rd Edition Schilling … · 2021. 1. 16. · Solution (a) causal 51 c 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted

Lecture 6: More HSAB & Arrow-Pushing · 2017-12-22 · Lecture 6: More HSAB & Arrow-Pushing Announcements: • Problem Set 3 posted online. Due Thurs, 9/21/17 Today: • Hard-Soft

AUBURN-OPELIKAΑ ΓΔ ALUMNAE CHAPTER NEWS ...s3.amazonaws.com/advancedcms-migr/iQFmalcmTylYrB3E68Nc...the details, and information will also be posted on social media. SATURDAY 2/16/19

First posted online on 10 October 2018 as 10 ... - Klein Lab · Salivary glands originate from an invagination of the oral epithelium into a condensing mesenchyme (embryonic day (E)

Critical review of BUTORPHANOL - WHO | World … production in HEK cells expressing μ receptors (Gharagozlou et al., 2003) yet less efficacious ... Clinical studies on the use for

Research Project Thesispiperidine alkaloid conhydrine is an alkaloid for which pharmacological roles have not yet been investigated, although it is known that other alkaloids including

Heavy Baryons: the Rooftopsdietrich/SLIDES/Williams.pdf · DDD + DDD ! 6 1867 = 11202 Triply charmed baryon ccc + DDD + N No measurement of triply charmed baryons yet: theory . Recoil

Exploiting Naivete about Self-Control in the Credit Marketsticerd.lse.ac.uk/seminarpapers/et06112008.pdf · of payday borrowers default on a loan, yet do so only after paying signiﬁcant

campus according to posted signs. NOTE: Permits and/or ... · Basketball Courts Tenni s Court WSU Arboretum Alumni Arboretum Sand Volleyball Courts Basketball Courts Basketball Court

Chapter 21 Univariate Statistical Analysis © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted.

Overview - csee.umbc.edu€¦ · UMBC CSEE Colloquium 12:00pm Friday, 9th October ITE325b . 3 Bookkeeping u Assignment 2 posted u Upcoming: u Quiz 2: Blackboard u Reading: SNS 3.1,

Numerical Relativity or Jimmy, You Haven’t Done Enough Yet!

Linking Land Surface Phenology and Vegetation-Plot ... · information for future remote sensing studies and spatial modeling approaches from local to global scales; yet, so far this

Vconditions diverging from equilibrium. Modest differences in γ and γ′ lattice parameters and their mismatch were observed among the alloys, which varied with heat treatment. Yet,