Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric...

34
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1

Transcript of Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric...

Page 1: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

CS839:ProbabilisticGraphicalModels

Lecture7:LearningFullyObservedBNs

TheoRekatsinas

1

Page 2: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Exponentialfamily:abasicbuildingblock

2

• ForanumericrandomvariableX

isanexponentialfamilydistributionwithnatural(canonical)parameterη

• FunctionT(x)isasufficientstatistic.• FunctionA(η)=logZ(η)isthelognormalizer• Examples:Bernoulli,multinomial,Gaussian,Poisson,Gamma,Categorical

p(x|⌘) = h(x) exp

�⌘

TT (x)�A(⌘)

�=

1

Z(⌘)

h(x) exp(⌘

TT (x))

Page 3: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Whyexponentialfamily?

3

• Momentgeneratingproperty

WecaneasilycomputemomentsofanyexponentialfamilydistributionbytakingthederivativesofthelognormalizerA(η)

Page 4: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLEforExponentialFamily

4

• Foriid datathelog-likelihoodis

• Wetakethederivativesandsetthemtozero

• Weperformmomentmatching

• Wecaninferthecanonicalparametersusing

Page 5: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

GeneralizedLinearModels

5

• Thegraphicalmodel:• Linearregression• Discriminativelinearclassification

• GeneralizedLinearModel• Theobservedinputxisassumedtoenterintothemodelviaalinearcombinationofitselements.

• Theconditionalmeanμisrepresentedasafunctionf(ξ)ofξ,wherefisknownastheresponsefunction.

• Theobservedoutputyisassumedtobecharacterizedbyanexponentialfamilydistributionwithconditionalmeanμ.

Ep(T ) = µ = f(✓TX)

⇠ = ✓Tx

Page 6: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

LearninginGraphicalModels

6

• Goal:Givenasetofindependentsamples(assignments torandomvariables),findthebestBayesianNetwork(bothDAGandCPDs)

(B,E,A,C,R)=(T,F,F,T,F)(B,E,A,C,R)=(T,F,T,T,F)…(B,E,A,C,R)=(F,T,T,T,F)

Page 7: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

LearninginGraphicalModels

7

• Goal:Givenasetofindependentsamples(assignments torandomvariables),findthebestBayesianNetwork(bothDAGandCPDs)

(B,E,A,C,R)=(T,F,F,T,F)(B,E,A,C,R)=(T,F,T,T,F)…(B,E,A,C,R)=(F,T,T,T,F)

Structurelearning

Page 8: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

LearninginGraphicalModels

8

• Goal:Givenasetofindependentsamples(assignments torandomvariables),findthebestBayesianNetwork(bothDAGandCPDs)

(B,E,A,C,R)=(T,F,F,T,F)(B,E,A,C,R)=(T,F,T,T,F)…(B,E,A,C,R)=(F,T,T,T,F)

Structurelearning

Parameterlearning

Page 9: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

LearninginGraphicalModels

9

• Goal:Givenasetofindependentsamples(assignments torandomvariables),findthebestBayesianNetwork(bothDAGandCPDs)

(B,E,A,C,R)=(T,F,F,T,F)(B,E,A,C,R)=(T,F,T,T,F)…(B,E,A,C,R)=(F,T,T,T,F)

Structurelearning

Parameterlearning

Laterintheclass

Page 10: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

ParameterEstimationforFullyObservedGMs

10

• Thedata:D=(x1,x2,x3,… ,xN)• AssumethegraphGisknownandfixed• Expertdesignorstructurelearning

• Goal:estimatefromadatasetofNindependent,identicallydistributed(iid)trainingexamplesD• EachtrainingexamplecorrespondstoavectorofMvaluesonepernoderandomvariable• Modelshouldbecompletelyobservable:nomissingvalues,nohiddenvariables

Page 11: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Densityestimation

11

• Aconstructionofan estimate,basedonobserveddata,ofanunobservableunderlyingprobabilitydensityfunction

• Canbeviewedassingle-nodegraphicalmodels

• Instancesofexponentialfamilydistribution

• BuildingblocksofgeneralGM

• MLEandBayesianestimate

Page 12: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

DiscreteDistributions

12

• Bernoullidistribution:

• Multinomialdistribution:Mult(1,θ)

P (x) = p

x(1� p)1�x

X = [X1, X2, X3, X4, X5, X6] Xj = [0, 1],X

j2[1,··· ,6]

Xj = 1

Xj = 1 with probability ✓j ,X

j2[1,··· ,6]

thetaj = 1

P (Xj = 1) = ✓j

Page 13: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

DiscreteDistributions

13

• Multinomialdistribution:Mult(n,θ)

n = [n1, n2, . . . , nk] whereX

j

nj = N

p(n) =N !

n1!n2! · · ·nk!✓n11 ✓n2

2 · · · ✓nKK

Page 14: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Example:multinomialmodel

14

• Data:WeobservedNiid dierolls(K-sided):D={5,1,K,…3}}

• Model:

• Likelihoodofanobservation:

• LikelihoodofD:

xn = [xn,1, xn,2, · · · , xn,K ] where xn,k = 0, 1KX

k=1

xn,k = 1

Xn,k = 1 with probability ✓k and

X

k2{1,··· ,K}

✓k = 1

P (x

i

) = P ({xn,k

= 1, where k is the index of the n-th roll})

= ✓

k

= ✓

xn,1

1 ✓

xn,2

2 · · · ✓xn,k

k

=

KY

k=1

xn,k

k

P (x1, x2, . . . , xN |✓) =NY

n=1

P (xn|✓) =Y

k

nkk

Page 15: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLE:constrainedoptimization

15

• Objectivefunction:

• Weneedtomaximizethissubjecttotheconstraint:

• Lagrangemultipliers:

• Derivatives:

• Sufficientstatistics?

l(✓;D) = logP (D|✓) = log

Y

k

✓nkk =

X

k

nk log ✓k

¯l(✓;D) =

X

k

nk log ✓k + �(1�X

k

✓k)

@ l̄

@✓k=

nk

✓k� � = 0

nk = �✓k )X

k

nk = �X

k

✓k ) N = �

✓̂k,MLE =1

N

X

n

xn,k

Page 16: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Bayesianestimation

16

• Ineedaprioroverparametersθ

• Dirichlet distribution

• Posteriorofθ

• Isomorphismoftheposteriorwiththeprior(conjugateprior)

• Posteriormeanestimation

P (✓) =�(

Pk ↵k)Q

k �(↵k)

Y

k

✓↵k�1k = C(↵)

Y

k

✓ak�1

P (✓|x1, . . . , xN ) =p(x1, . . . , xN |✓)p(✓)

p(x1, . . . , xN )/

Y

k

nkk

Y

k

↵k�1k =

Y

k

↵k+nk�1k

✓k =

Z✓kp(✓|D)d✓ = C

Z✓k

Y

k

✓↵k+nk�1k d✓ =

nk + ↵k

N + |↵|

Page 17: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

ContinuousDistributions

17

• Uniform

• Gaussian

• MultivariateGaussian

Page 18: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLEforamultivariateGaussian

18

• YoucanshowthattheMLEforμandΣ is

• Whatarethesufficientstatistics?

Page 19: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLEforamultivariateGaussian

19

• YoucanshowthattheMLEforμandΣ is

• Whatarethesufficientstatistics?• Rewrite

• Sufficientstatisticsare:

• SimilarforBayesianestimation:Normalprior

Page 20: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLEforgeneralBNs

20

• IfweassumetheparametersforeachCPDaregloballyindependent,andallnodesarefullyobserved,thenthelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode

• MLE-basedparameterestimationofGMreducestolocalest.ofeachGLIM.

Page 21: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

DecomposablelikelihoodofaBN

21

• ConsidertheGM:

• ThisisthesameaslearningfourseparatesmallerBNseachofwhichconsistsofanodeanitsparents.

Page 22: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

MLEforBNswithtabularCPDs

22

• EachCPDisrepresentedasatable(multinomial)with

• IncaseofmultipleparentstheCPDisahigh-dimensionaltable• Thesufficientstatisticsarecountsofvariableconfigurations

• Thelog-likelihoodis

• AndusingaLagrangemultipliertoenforcethatconditionalssumupto1wehave:

Page 23: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Whataboutparameterpriors?

23

• InaBNwehaveacollectionoflocaldistributions• HowcanwedefinepriorsoverthewholeBN?• WecouldwriteP(x1,x2,…xN;G,θ)P(θ|α)

• Symbolicallythesameasbeforebutθ isdefinedoveravectorofrandomvariablesthatfollowdifferentdistributions.

• Weneedθ todecomposetouselocalrules.Otherwisewecannotdecomposethelikelihoodanymore.

• Weneedcertainrulesonθ• CompleteModelEquivalence• GlobalParameterIndependence• LocalParameterIndependence• LikelihoodandPriorModularity

Page 24: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

GlobalandLocalParameterIndependence

24

• GlobalParameterIndependence• ForeveryDAGmodel

• LocalParameterIndependence• Foreverynode

Page 25: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

GlobalandLocalParameterIndependence

25

Page 26: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

WhichPDFssatisfytheseassumptions?

26

• DiscreteDAGModels

• GaussianDAGModels

Page 27: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Parametersharing

27

• TransitionprobabilitiesP(Xt|Xt-1)canbedifferent:Whatistheparameterizationcost?

Page 28: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Parametersharing

28

• TransitionprobabilitiesP(Xt|Xt-1)canbedifferent:Whatistheparameterizationcost?• Ineedadifferentlocalconditionaldistribution.• HowcanIlearnthetransitionprobabilities?Ineediid data.Somytrainingexamplesshouldcorrespondmultiplerollsofthedice.Ineedtodoalignmentetc.

• Whatdowedo?

Page 29: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Parametersharing

29

• Wewillmaketheassumptionthatforeverytransitionwefollowthesameconditional• Consideratime-invariant(stationary)1st-orderMarkovmodel

• Initialstateprobabilityvector• Statetransitionprobabilitymatrix

• Now:optimizeseparately• π (multinomial)• WhataboutA?

Page 30: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

LearningaMarkovchaintransitionmatrix

30

• Aisastochasticmatrixwith

• EachrowofAisamultinomialdistribution• MLEofAij isthefractionoftransitionsfromi toj

• Sparsedataproblem:• Ifi tojdidnotoccurinthedatathenAij is0.Anyfuturesequencewithpairi tojwillhavezeroprobability

• Backoff smoothing

X

j

Aij = 1

Page 31: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Example:HMMsupervisedMLestimation

31

• Givenx=x1,x2,…,xN forwhichthetruestatepathy1,y2,…,yN isknown:• Define

• Themaximumlikelihoodparametersθ are:

• IfxiscontinuouswecanapplylearningrulesforaGaussian

Page 32: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

SupervisedMLestimation

32

• Intuition: whenweknowtheunderlyingstates,thebestestimateofθ istheaveragefrequencyoftransitionsandemissionsthatoccurinthetrainingdata

• Drawback:Givenlittledata,wemayoverfit (rememberzeroprobabilities)

• Example:• Given10rollswehave

• Then

Page 33: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Pseudocounts

33

• Solutionforsmalltrainingsets:• Addpseudocounts

• Thepseudocounts representourpriorbelief

• Largetotalpseudocounts =>strongpriorbelief• Smalltotalpseudocounts =>smoothingtoavoid0probabilities• EquivalenttoBayesianestimationunderauniformpriorwithparameterstrengthequaltothepseudocounts.

Page 34: Lecture 7 LBN - GitHub Pages · Exponential family: a basic building block 2 •For a numeric random variable X is an exponential family distribution with natural (canonical) parameter

Summary

34

• ForfullyobservedBN,thelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode;thuslearningisalsofactored

• Learningsingle-nodeGM– densityestimation:exp.Family• Typicaldiscretedistribution• Typicalcontinuousdistribution• Conjugatepriors

• LearningBNwithmorenodes:• Localoperations