Logistic Regression Optimization -...

42
Logistic Regression Optimization Natural Language Processing: Jordan Boyd-Graber University of Maryland DERIVATION Slides adapted from Emily Fox Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 1 / 18

Transcript of Logistic Regression Optimization -...

Page 1: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Logistic Regression Optimization

Natural Language Processing: JordanBoyd-GraberUniversity of MarylandDERIVATION

Slides adapted from Emily Fox

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 1 / 18

Page 2: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Reminder: Logistic Regression

P(Y = 0|X) =1

1+exp�

β0 +∑

i βiXi

� (1)

P(Y = 1|X) =exp

β0 +∑

i βiXi

1+exp�

β0 +∑

i βiXi

� (2)

� Discriminative prediction: p(y |x)� Classification uses: ad placement, spam detection

� What we didn’t talk about is how to learn β from data

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 2 / 18

Page 3: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Logistic Regression: Objective Function

L ≡ lnp(Y |X ,β) =∑

j

lnp(y(j) |x(j),β) (3)

=∑

j

y(j)�

β0 +∑

i

βix(j)i

− ln

1+exp

β0 +∑

i

βix(j)i

��

(4)

Training data (y ,x) are fixed. Objective function is a function of β . . . whatvalues of β give a good value.

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 3 / 18

Page 4: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Logistic Regression: Objective Function

L ≡ lnp(Y |X ,β) =∑

j

lnp(y(j) |x(j),β) (3)

=∑

j

y(j)�

β0 +∑

i

βix(j)i

− ln

1+exp

β0 +∑

i

βix(j)i

��

(4)

Training data (y ,x) are fixed. Objective function is a function of β . . . whatvalues of β give a good value.

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 3 / 18

Page 5: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Convexity

� Convex function

� Doesn’t matter where you start, ifyou “walk up” objective

� Gradient!

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 4 / 18

Page 6: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Convexity

� Convex function

� Doesn’t matter where you start, ifyou “walk up” objective

� Gradient!

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 4 / 18

Page 7: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 8: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 9: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

0

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 10: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

0

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 11: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 12: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 13: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 14: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 15: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

3

UndiscoveredCountry

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 16: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

Parameter

Objectivestart

stop

undiscoveredcountry

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 17: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient Descent (non-convex)

Goal

Optimize log likelihood with respect to variables β

Parameter

Objectivestart

stop

undiscoveredcountry

Luckily, (vanilla) logistic regression is convex

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 5 / 18

Page 18: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient for Logistic Regression

To ease notation, let’s define

πi =expβT xi

1+expβT xi(5)

Our objective function is

L =∑

i

logp(yi |xi) =∑

i

Li =∑

i

¨

logπi if yi = 1

log(1−πi) if yi = 0(6)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 6 / 18

Page 19: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Taking the Derivative

Apply chain rule:

∂L∂ βj

=∑

i

∂Li( ~β)

∂ βj=∑

i

(

1πi

∂ πi∂ βj

if yi = 11

1−πi

− ∂ πi∂ βj

if yi = 0(7)

If we plug in the derivative,

∂ πi

∂ βj=πi(1−πi)xj , (8)

we can merge these two cases

∂Li

∂ βj= (yi −πi)xj . (9)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 7 / 18

Page 20: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient for Logistic Regression

Gradient

∇βL ( ~β) =

∂L ( ~β)

∂ β0, . . . ,

∂L ( ~β)

∂ βn

(10)

Update

∆β ≡η∇βL ( ~β) (11)

β ′i ←βi +η∂L ( ~β)

∂ βi(12)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 8 / 18

Page 21: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient for Logistic Regression

Gradient

∇βL ( ~β) =

∂L ( ~β)

∂ β0, . . . ,

∂L ( ~β)

∂ βn

(10)

Update

∆β ≡η∇βL ( ~β) (11)

β ′i ←βi +η∂L ( ~β)

∂ βi(12)

Why are we adding? What would well do if we wanted to do descent?

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 8 / 18

Page 22: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient for Logistic Regression

Gradient

∇βL ( ~β) =

∂L ( ~β)

∂ β0, . . . ,

∂L ( ~β)

∂ βn

(10)

Update

∆β ≡η∇βL ( ~β) (11)

β ′i ←βi +η∂L ( ~β)

∂ βi(12)

η: step size, must be greater than zero

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 8 / 18

Page 23: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Gradient for Logistic Regression

Gradient

∇βL ( ~β) =

∂L ( ~β)

∂ β0, . . . ,

∂L ( ~β)

∂ βn

(10)

Update

∆β ≡η∇βL ( ~β) (11)

β ′i ←βi +η∂L ( ~β)

∂ βi(12)

NB: Conjugate gradient is usually better, but harder to implement

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 8 / 18

Page 24: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Choosing Step Size

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 9 / 18

Page 25: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Choosing Step Size

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 9 / 18

Page 26: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Choosing Step Size

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 9 / 18

Page 27: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Choosing Step Size

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 9 / 18

Page 28: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Choosing Step Size

Parameter

Objective

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 9 / 18

Page 29: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Remaining issues

� When to stop?

� What if β keeps getting bigger?

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 10 / 18

Page 30: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Regularized Conditional Log Likelihood

Unregularized

β ∗= argmaxβ

ln�

p(y(j) |x(j),β)�

(13)

Regularized

β ∗= argmaxβ

ln�

p(y(j) |x(j),β)�

−µ∑

i

β 2i (14)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 11 / 18

Page 31: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Regularized Conditional Log Likelihood

Unregularized

β ∗= argmaxβ

ln�

p(y(j) |x(j),β)�

(13)

Regularized

β ∗= argmaxβ

ln�

p(y(j) |x(j),β)�

−µ∑

i

β 2i (14)

µ is “regularization” parameter that trades off between likelihood andhaving small parameters

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 11 / 18

Page 32: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Approximating the Gradient

� Our datasets are big (to fit into memory)

� . . . or data are changing / streaming

� Hard to compute true gradient

L (β)≡Ex [∇L (β ,x)] (15)

� Average over all observations

� What if we compute an update just from one observation?

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 12 / 18

Page 33: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Approximating the Gradient

� Our datasets are big (to fit into memory)

� . . . or data are changing / streaming

� Hard to compute true gradient

L (β)≡Ex [∇L (β ,x)] (15)

� Average over all observations

� What if we compute an update just from one observation?

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 12 / 18

Page 34: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Approximating the Gradient

� Our datasets are big (to fit into memory)

� . . . or data are changing / streaming

� Hard to compute true gradient

L (β)≡Ex [∇L (β ,x)] (15)

� Average over all observations

� What if we compute an update just from one observation?

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 12 / 18

Page 35: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Getting to Union Station

Pretend it’s a pre-smartphone world and you want to get to Union Station

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 13 / 18

Page 36: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Stochastic Gradient for Logistic Regression

Given a single observation xi chosen at random from the dataset,

βj ←β ′j +η�

−µβ ′j + xij [yi −πi ]�

(16)

Examples in class.

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 14 / 18

Page 37: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Stochastic Gradient for Logistic Regression

Given a single observation xi chosen at random from the dataset,

βj ←β ′j +η�

−µβ ′j + xij [yi −πi ]�

(16)

Examples in class.

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 14 / 18

Page 38: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Stochastic Gradient for Regularized Regression

L = logp(y |x;β)−µ∑

j

β 2j (17)

Taking the derivative (with respect to example xi )

∂L∂ βj

=(yi −πi)xj −2µβj (18)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 15 / 18

Page 39: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Stochastic Gradient for Regularized Regression

L = logp(y |x;β)−µ∑

j

β 2j (17)

Taking the derivative (with respect to example xi )

∂L∂ βj

=(yi −πi)xj −2µβj (18)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 15 / 18

Page 40: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Algorithm

1. Initialize a vector B to be all zeros

2. For t = 1, . . . ,T� For each example ~xi ,yi and feature j :� Compute πi ≡ Pr(yi = 1 | ~xi)� Set β [j] =β [j]′+λ(yi −πi)xi

3. Output the parameters β1, . . . ,βd .

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 16 / 18

Page 41: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

Proofs about Stochastic Gradient

� Depends on convexity of objective and how close ε you want to get toactual answer

� Best bounds depend on changing η over time and per dimension (notall features created equal)

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 17 / 18

Page 42: Logistic Regression Optimization - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/04a_sgd.pdfReminder: Logistic Regression P(Y =0jX)= 1 1+exp 0 + P i iX i (1) P(Y =1jX)= exp 0 +

In class

� Your questions!

� Working through simple example

� Prepared for logistic regression homework

Natural Language Processing: Jordan Boyd-Graber | UMD Logistic Regression Optimization | 18 / 18