Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Page 1: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Latent Models:Sequence Models Beyond HMMs and

Machine Translation Alignment

CMSC 473/673

UMBC

Page 2: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 3: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Why Do We Need Both the Forward and Backward Algorithms? Compute posteriors

α(i, s) * p(s’ | s) * p(obs at i+1 | s’) * β(i+1, s’) =total probability of paths through the s→s’ arc (at time i)

α(i, s) * β(i, s) = total probability of paths through state s at step i

𝑝 𝑧𝑖 = 𝑠 𝑤1, ⋯ , 𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝛽(𝑖, 𝑠)

𝛼(𝑁 + 1, END)

𝑝 𝑧𝑖 = 𝑠, 𝑧𝑖+1 = 𝑠′ 𝑤1, ⋯ , 𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝑝 𝑠′ 𝑠 ∗ 𝑝 obs𝑖+1 𝑠′ ∗ 𝛽(𝑖 + 1, 𝑠′)

𝛼(𝑁 + 1, END)

Page 4: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

EM for HMMs0. Assume some value for your parameters

Two step, iterative algorithm

1. E-step: count under uncertainty, assuming these parameters

2. M-step: maximize log-likelihood, assuming these uncertain counts

estimated counts

pobs(w | s)

ptrans(s’ | s)

𝑝∗ 𝑧𝑖 = 𝑠 𝑤1, ⋯ ,𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝛽(𝑖, 𝑠)

𝛼(𝑁 + 1, END)

𝑝∗ 𝑧𝑖 = 𝑠, 𝑧𝑖+1 = 𝑠′ 𝑤1, ⋯ ,𝑤𝑁) =𝛼 𝑖, 𝑠 ∗ 𝑝 𝑠′ 𝑠 ∗ 𝑝 obs𝑖+1 𝑠′ ∗ 𝛽(𝑖 + 1, 𝑠′)

𝛼(𝑁 + 1, END)

Page 5: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

EM For HMMs (Baum-Welch

Algorithm)

α = computeForwards()

β = computeBackwards()

L = α[N+1][END]

for(i = N; i ≥ 0; --i) {

for(next = 0; next < K*; ++next) {

cobs(obsi+1 | next) += α[i+1][next]* β[i+1][next]/L

for(state = 0; state < K*; ++state) {

u = pobs(obsi+1 | next) * ptrans (next | state)

ctrans(next| state) +=

α[i][state] * u * β[i+1][next]/L

}

update pobs, ptrans using cobs, ctrans

Page 6: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Semi-Supervised Learning

? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?? ? ?

labeled data:• human annotated• relatively small/few

examples

unlabeled data:• raw; not annotated• plentiful

EM

Page 7: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 8: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Warren Weaver’s Note

When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”

(Warren Weaver, 1947)http://www.mt-archive.info/Weaver-1949.pdf

Slides courtesy Rebecca Knowles

http://www.mt-archive.info/Weaver-1949.pdf

Noisy Channel Model

language

язы́к Decode

speak

text

word

language

Rerank

speak

text

word

language

written in (clean) English

observed Russian (noisy)

text

translation/decode model

(clean) language model

English

Slides courtesy Rebecca Knowles

Noisy Channel Model

Decode Rerank

written in (clean) English

observed Russian (noisy)

text

translation/decode model

(clean) language model

English

language

язы́к

speak

text

word

language

speak

text

word

language

Slides courtesy Rebecca Knowles

Noisy Channel Model

Decode Rerank

written in (clean) English

observed Russian (noisy)

text

translation/decode model

(clean) language model

English

language

язы́к

speak

text

word

language

speak

text

word

language

Slides courtesy Rebecca Knowles

Translation

Translate French (observed) into English:

The cat is on the chair.

Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

Translation

Translate French (observed) into English:

The cat is on the chair.

Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

Translation

Translate French (observed) into English:

The cat is on the chair.

Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

?

Alignment

The cat is on the chair.

Le chat est sur la chaise.

The cat is on the chair.

Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

Parallel Texts

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,

Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,

Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,

Whereas it is essential to promote the development of friendly relations between nations,…

http://www.un.org/en/universal-declaration-human-rights/

Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.

Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.

Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan.

Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

Slides courtesy Rebecca Knowles

http://www.un.org/en/universal-declaration-human-rights/

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

Preprocessing

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,

Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,

Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,

Whereas it is essential to promote the development of friendly relations between nations,…

http://www.un.org/en/universal-declaration-human-rights/Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.

Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.

Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan.

Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

• Sentence align• Clean corpus• Tokenize• Handle case• Word segmentation

(morphological, BPE, etc.)• Language-specific

preprocessing (example: pre-reordering)

• ...

Slides courtesy Rebecca Knowles

http://www.un.org/en/universal-declaration-human-rights/

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

Page 18: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Alignments

If we had word-aligned text, we could easily estimate P(f|e).

But we don’t usually have word alignments, and they are expensive to produce by hand…

If we had P(f|e) we could produce alignments automatically.

Slides courtesy Rebecca Knowles

IBM Model 1 (1993)

• Lexical Translation Model• Word Alignment Model• The simplest of the original IBM models• For all IBM models, see the original paper

(Brown et al, 1993): http://www.aclweb.org/anthology/J93-2003

Slides courtesy Rebecca Knowles

http://www.aclweb.org/anthology/J93-2003

Simplified IBM 1

• We’ll work through an example with a simplified version of IBM Model 1

• Figures and examples are drawn from A Statistical MT Tutorial Workbook, Section 27, (Knight, 1999)

• Simplifying assumption: each source word must translate to exactly one target word and vice versa

Slides courtesy Rebecca Knowles

https://www.isi.edu/natural-language/mt/wkbk-rw.pdf

IBM Model 1 (1993)

f: vector of French words

(visualization of alignment)

e: vector of English words

a: vector of alignment indices

Le chat est sur la chaise verte

The cat is on the green chair

0 1 2 3 4 6 5

Slides courtesy Rebecca Knowles

IBM Model 1 (1993)

f: vector of French words

(visualization of alignment)

e: vector of English words

a: vector of alignment indices

t(fj|ei) : translation probability of the word fj given the word ei

Le chat est sur la chaise verte

The cat is on the green chair

0 1 2 3 4 6 5

Slides courtesy Rebecca Knowles

Model and Parameters

Want: P(f|e)But don’t know how to train this directly…

Solution: Use P(a, f|e), where a is an alignmentRemember:

Slides courtesy Rebecca Knowles

Model and Parameters: Intuition

Translation prob.:

Example:

Interpretation:How probable is it that we see fj given ei

Slides courtesy Rebecca Knowles

Model and Parameters: Intuition

Alignment/translation prob.:

Example (visual representation of a):

P( | “the cat”) < P( | “the cat”)

Interpretation:How probable are the alignment a and the translation f (given e)

le chat

the cat

le chat

the cat

Slides courtesy Rebecca Knowles

Model and Parameters: Intuition

Alignment prob.:Example:

P( | “le chat”, “the cat”) < P( | “le chat”, “the cat”)

Interpretation:How probable is alignment a (given e and f)

Slides courtesy Rebecca Knowles

Model and Parameters

How to compute:

Slides courtesy Rebecca Knowles

Parameters

For IBM model 1, we can compute all parameters given translation parameters:

How many of these are there?

Slides courtesy Rebecca Knowles

Parameters

For IBM model 1, we can compute all parameters given translation parameters:

How many of these are there?|French vocabulary| x |English vocabulary|

Slides courtesy Rebecca Knowles

Data

Two sentence pairs:

English French

b c x y

b y

Slides courtesy Rebecca Knowles

All Possible Alignments

x y

b c

x y

b c

y

b

(French: x, y)

(English: b, c)

Remember:simplifying assumption that each word must be aligned exactly once

Slides courtesy Rebecca Knowles

Expectation Maximization (EM)0. Assume some value for and compute other parameter values

Two step, iterative algorithm

1. E-step: count alignments and translations under uncertainty, assuming these parameters

2. M-step: maximize log-likelihood (update parameters), using uncertain counts

estimated counts

P( | “the cat”)

P( | “the cat”)le chat

le chat

Slides courtesy Rebecca Knowles

Review of IBM Model 1 & EM

Iteratively learned an alignment/translation model from sentence-aligned text (without “gold standard” alignments)

Model can now be used for alignment and/or word-level translation

We explored a simplified version of this; IBM Model 1 allows more types of alignments

Slides courtesy Rebecca Knowles

Why is Model 1 insufficient?

Why won’t this produce great translations?Indifferent to order (language model may help?)Translates one word at a timeTranslates each word in isolation...

Slides courtesy Rebecca Knowles

Uses for Alignments

Component of machine translation systems

Produce a translation lexicon automatically

Cross-lingual projection/extraction of information

Supervision for training other models (for example, neural MT systems)

Slides courtesy Rebecca Knowles

Evaluating Machine Translation

Human evaluations:Test set (source, human reference translations, MT output)

Humans judge the quality of MT output (in one of several possible ways)

Koehn (2017), http://mt-class.org/jhu/slides/lecture-evaluation.pdf

Slides courtesy Rebecca Knowles

http://mt-class.org/jhu/slides/lecture-evaluation.pdf

Evaluating Machine Translation

Automatic evaluations:Test set (source, human reference translations, MT output)

Aim to mimic (correlate with) human evaluations

Many metrics:TER (Translation Error/Edit Rate)

HTER (Human-Targeted Translation Edit Rate)

BLEU (Bilingual Evaluation Understudy)

METEOR (Metric for Evaluation of Translation with Explicit Ordering)

Slides courtesy Rebecca Knowles

Machine Translation Alignment Now

Explicitly with fancier IBM models

Implicitly/learned jointly with attention in recurrent neural networks (RNNs)

Page 39: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 40: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Recall: N-gram to Maxent to Neural Language Models

predict the next word

given some context…

𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) ∝ 𝑐𝑜𝑢𝑛𝑡(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1, 𝑤𝑖)

wi-3 wi-2

wi

wi-1

compute beliefs about what is likely…

Page 41: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Recall: N-gram to Maxent to Neural Language Models

predict the next word

given some context…wi-3 wi-2

wi

wi-1

compute beliefs about what is likely…

𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) = softmax(𝜃 ⋅ 𝑓(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1, 𝑤𝑖))

Page 42: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Hidden Markov Model Representation

𝑝 𝑧1, 𝑤1, 𝑧2, 𝑤2, … , 𝑧𝑁, 𝑤𝑁 = 𝑝 𝑧1| 𝑧0 𝑝 𝑤1|𝑧1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1 𝑝 𝑤𝑁|𝑧𝑁

=ෑ

𝑖

𝑝 𝑤𝑖|𝑧𝑖 𝑝 𝑧𝑖| 𝑧𝑖−1emission

probabilities/parameterstransitionprobabilities/parameters

z1

w1

…

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

Page 43: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A Different Model’s Representation

z1

w1

…

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

Page 44: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A Different Model’s Representation

z1

w1

…

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁

=ෑ

𝑖

𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖

Page 45: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A Different Model’s Representation

z1

w1

…

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

𝑝 𝑧𝑖 𝑧𝑖−1, 𝑤𝑖) ∝ exp( 𝜃𝑇𝑓 𝑤𝑖 , 𝑧𝑖−1, 𝑧𝑖 )

𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁

=ෑ

𝑖

𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖

Page 46: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A Different Model’s Representation

z1

w1

…

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

Maximum Entropy Markov Model (MEMM)

𝑝 𝑧𝑖 𝑧𝑖−1, 𝑤𝑖) ∝ exp( 𝜃𝑇𝑓 𝑤𝑖 , 𝑧𝑖−1, 𝑧𝑖 )

𝑝 𝑧1, 𝑧2, … , 𝑧𝑁|𝑤1, 𝑤2, … , 𝑤𝑁 = 𝑝 𝑧1| 𝑧0, 𝑤1 ⋯𝑝 𝑧𝑁| 𝑧𝑁−1, 𝑤𝑁

=ෑ

𝑖

𝑝 𝑧𝑖| 𝑧𝑖−1, 𝑤𝑖

Page 47: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

MEMMs

Discriminative: don’t care about generating observed sequence at all

Maxent: use features

Problem: Label-Bias problem

z1

w1

…

w2 w3 w4

z2 z3 z4

Page 48: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Label-Bias Problem

zi

wi

Page 49: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Label-Bias Problem

zi

wi

1incoming mass must

sum to 1

Page 50: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Label-Bias Problem

zi

wi

1 1incoming mass must

sum to 1outgoing mass must

sum to 1

Page 51: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Label-Bias Problem

zi

wi

1 1incoming mass must

sum to 1outgoing mass must

sum to 1

observe, but do not generate (explain) the

observation

Take-aways:• the model can learn to

ignore observations• the model can get itself

stuck on “bad” paths

Page 52: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 53: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

(Linear Chain) Conditional Random Fields

Discriminative: don’t care about generating observed sequence at all

Condition on the entire observed word sequence w1…wN

Maxent: use features

Solves the label-bias problem

z1 …

w1 w2 w3 w4 …

z2 z3 z4

Page 54: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

(Linear Chain) Conditional Random Fields

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁)

∝ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝑤1, … , 𝑤𝑁 )

Page 55: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

(Linear Chain) Conditional Random Fields

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁)

∝ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

condition on entire sequence

Page 56: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

CRFs are Very Popular for {POS, NER, other sequence tasks}

• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝

ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

Page 57: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

CRFs are Very Popular for {POS, NER, other sequence tasks}

• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))

• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝

ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

Page 58: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

CRFs are Very Popular for {POS, NER, other sequence tasks}

• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))

• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝

ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

Can’t easily do these with an HMM

➔

Conditional modelscan allow richer

features

Page 59: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

CRFs are Very Popular for {POS, NER, other sequence tasks}

• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))

• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝

ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

Can’t easily do these with an HMM

➔

Conditional modelscan allow richer

features

We’ll cover syntactic paths next class

Page 60: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

CRFs are Very Popular for {POS, NER, other sequence tasks}

• POSf(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Noun & 𝑧𝑖 == Verb &(𝑤𝑖−2 in list of adjectives or determiners))

• NERfpath p(𝑧𝑖−1, 𝑧𝑖 , 𝒘) =

(𝑧𝑖−1 == Per & 𝑧𝑖 == Per &(syntactic path p involving 𝑤𝑖 exists ))

z1 …

w1 w2 w3 w4 …

z2 z3 z4

𝑝 𝑧1, … , 𝑧𝑁 𝑤1, … , 𝑤𝑁) ∝

ෑ

𝑖

exp( 𝜃𝑇𝑓 𝑧𝑖−1, 𝑧𝑖 , 𝒘𝟏, … ,𝒘𝑵 )

Can’t easily do these with an HMM

➔

Conditional modelscan allow richer

features

CRFs can be used in neural networks too:https://www.tensorflow.org/versions/r1.15/api_docs/python

/tf/contrib/crf/CrfForwardRnnCellhttps://pytorch-crf.readthedocs.io/en/stable/

https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/crf/CrfForwardRnnCell

https://pytorch-crf.readthedocs.io/en/stable/

Page 61: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Conditional vs. Sequence

CRF Tutorial, Fig 1.2, Sutton & McCallum (2012)

We’ll cover these in 691: Graphical and Statistical

Models of Learning

Page 62: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 63: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Recall: N-gram to Maxent to NeuralLanguage Models

predict the next word

given some context…wi-3 wi-2

wi

wi-1

compute beliefs about what is likely…

𝑝 𝑤𝑖 𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1) = softmax(𝜃𝑤𝑖⋅ 𝒇(𝑤𝑖−3, 𝑤𝑖−2, 𝑤𝑖−1))

create/use “distributed representations”… ei-3 ei-2 ei-1

combine these representations… C = f

matrix-vector product

ew

θwi

Page 64: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A More Typical View of Recurrent Neural Language Modeling

wi-3 wi-2 wiwi-1

hi-3 hi-2 hi-1 hi

wi-2 wi-1 wi+1wi

Page 65: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A More Typical View of Recurrent Neural Language Modeling

wi-3 wi-2 wiwi-1

hi-3 hi-2 hi-1 hi

wi-2 wi-1 wi+1wi

observe these words one at a time

Page 66: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A More Typical View of Recurrent Neural Language Modeling

wi-3 wi-2 wiwi-1

hi-3 hi-2 hi-1 hi

wi-2 wi-1 wi+1wi

observe these words one at a time

predict the next word

Page 67: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

A More Typical View of Recurrent Neural Language Modeling

wi-3 wi-2 wiwi-1

hi-3 hi-2 hi-1 hi

wi-2 wi-1 wi+1wi

observe these words one at a time

predict the next word

from these hidden states

Page 68: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

wi-3 wi-2 wiwi-1

hi-3 hi-2 hi-1 hi

wi-2 wi-1 wi+1wi

observe these words one at a time

predict the next word

from these hidden states

“cell”

A More Typical View of Recurrent Neural Language Modeling

Page 69: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

wiwi-1

hi-1 hi

wi+1wi

A Recurrent Neural Network Cell

Page 70: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

wiwi-1

hi-1 hi

wi+1wi

A Recurrent Neural Network Cell

W W

Page 71: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

encoding

wiwi-1

hi-1 hi

wi+1wi

A Recurrent Neural Network Cell

W W

U U

Page 72: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Recurrent Neural Network Cell

W W

U U

S S

Page 73: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)

Page 74: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =

1

1 + exp(−𝑥)

Page 75: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =

1

1 + exp(−𝑥)

Page 76: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =

1

1 + exp(−𝑥)

Page 77: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)𝜎 𝑥 =

1

1 + exp(−𝑥)ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)

Page 78: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)

ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)

must learn matrices U, S, W

Page 79: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)

ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)

must learn matrices U, S, W

suggested solution: gradient descent on prediction ability

Page 80: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)

ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)

must learn matrices U, S, W

suggested solution: gradient descent on prediction ability

problem: they’re tied across inputs/timesteps

Page 81: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

decoding

encoding

wiwi-1

hi-1 hi

wi+1wi

A Simple Recurrent Neural Network Cell

W W

U U

S S

ℎ𝑖 = 𝜎(𝑊ℎ𝑖−1 + 𝑈𝑤𝑖)

ෝ𝑤𝑖+1 = softmax(𝑆ℎ𝑖)

must learn matrices U, S, W

suggested solution: gradient descent on prediction ability

problem: they’re tied across inputs/timesteps

good news for you: many toolkits do this automatically

Page 82: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Why Is Training RNNs Hard?

Conceptually, it can get strange

But really getting the gradient just requires many applications of the chain rule for derivatives

Page 83: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Why Is Training RNNs Hard?

Conceptually, it can get strange

But really getting the gradient just requires many applications of the chain rule for derivatives

Vanishing gradients

Multiply the same matrices at eachtimestep➔multiply many matrices in the gradients

Page 84: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Why Is Training RNNs Hard?Conceptually, it can get strange

But really getting the gradient just requires many applications of the chain rule for derivatives

Vanishing gradients

Multiply the same matrices at eachtimestep➔multiply many matrices in the gradients

One solution: clip the gradients to a max value

Page 85: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Page 86: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Natural Language Processing

from torch import *from keras import *

Page 87: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Pick Your Toolkit

PyTorch

Deeplearning4j

TensorFlow

DyNet

Caffe

Keras

MxNet

Gluon

CNTK

…

Comparisons:https://en.wikipedia.org/wiki/Comparison_of_deep_learning_softwarehttps://deeplearning4j.org/compare-dl4j-tensorflow-pytorchhttps://github.com/zer0n/deepframeworks (older---2015)

https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

https://deeplearning4j.org/compare-dl4j-tensorflow-pytorch

https://github.com/zer0n/deepframeworks

Page 88: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 89: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 90: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 91: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 92: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

encode

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 93: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

decode

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 94: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 95: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 96: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

wi-2

wi-1

wi

wi+1

hi-2 hi-1 hi

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 97: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictions

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 98: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictions

compute gradient

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 99: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictions

compute gradient

perform SGD

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 100: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Another Solution: LSTMs/GRUs

LSTM: Long Short-Term Memory (Hochreiter & Schmidhuber, 1997)

GRU: Gated Recurrent Unit (Cho et al., 2014)

Basic Ideas: learn to forgethttp://colah.github.io/posts/2015-08-Understanding-LSTMs/

forget line

representation line

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 101: Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all

Outline

Review: EM for HMMs

Machine Translation Alignment

Limited Sequence ModelsMaximum Entropy Markov ModelsConditional Random Fields

Recurrent Neural NetworksBasic DefinitionsExample in PyTorch

Download - Other Sequence Models · 2019. 11. 18. · Slides courtesy Rebecca Knowles. Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all