Probabilistic Graphical Models - Caltech...

32
Probabilistic Graphical Models Lecture 2 – Bayesian Networks Representation CS/CNS/EE 155 Andreas Krause

Transcript of Probabilistic Graphical Models - Caltech...

Page 1: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

Probabilistic

Graphical Models

Lecture 2 – Bayesian Networks Representation

CS/CNS/EE 155

Andreas Krause

Page 2: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

2

AnnouncementsWill meet in Steele 102 for now

Still looking for another 1-2 TAs..

Homework 1 will be out soon. Start early!! ☺

Page 3: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

3

Multivariate distributionsInstead of random variable, have random vector

X(ω) = [X1(ω),…,Xn(ω)]

Specify P(X1=x1,…,Xn=xn)

Suppose all Xi are Bernoulli variables.

How many parameters do we need to specify?

3

Page 4: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

4

Marginal distributionsSuppose we have joint distribution P(X1,…,Xn)

Then

If all Xi binary: How many terms?

4

Page 5: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

5

Rules for random variablesChain rule

Bayes’ rule

Page 6: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

6

Key concept: Conditional independence

Events α, β conditionally independent given γ if

Random variables X and Y cond. indep. given Z if

for all x∈ Val(X), y∈ Val(Y), Z∈ Val(Z)

P(X = x, Y = y | Z = z) = P(X =x | Z = z) P(Y = y| Z= z)

If P(Y=y |Z=z)>0, that’s equivalent to

P(X = x | Z = z, Y = y) = P(X = x | Z = z)

Similarly for sets of random variables X, Y, Z

We write: P � X⊥ Y | Z6

Page 7: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

7

Why is conditional independence useful?P(X1,…,Xn) = P(X1) P(X2 | X1) … P(Xn | X1,…,Xn-1)

How many parameters?

Now suppose X1 …Xi-1 ⊥ Xi+1… Xn | Xi for all i

Then

P(X1,…,Xn) =

How many parameters?

Can we compute P(Xn) more efficiently?

Page 8: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

8

Properties of Conditional Independence

Symmetry

X ⊥ Y | Z ⇒ Y ⊥ X | Z

Decomposition

X ⊥ Y,W | Z ⇒ X ⊥ Y | Z

Contraction

(X ⊥ Y | Z) Æ (X ⊥W | Y,Z) ⇒ X ⊥ Y,W | Z

Weak union

X ⊥ Y,W | Z ⇒ X ⊥ Y | Z,W

Intersection

(X ⊥ Y | Z,W) Æ (X ⊥W | Y,Z) ⇒ X ⊥ Y,W | Z

Holds only if distribution is positive, i.e., P>0

Page 9: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

9

Key questionsHow do we specify distributions that satisfy particular

independence properties?

� Representation

How can we exploit independence properties for

efficient computation?

� Inference

How can we identify independence properties

present in data?

� Learning

Will now see example: Bayesian Networks

Page 10: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

10

Key ideaConditional parameterization

(instead of joint parameterization)

For each RV, specify P(Xi | XA) for set XA of RVs

Then use chain rule to get joint parametrization

Have to be careful to guarantee legal distribution…

Page 11: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

11

Example: 2 variables

Page 12: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

12

Example: 3 variables

Page 13: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

13

Example: Naïve Bayes modelsClass variable Y

Evidence variables X1,…,Xn

Assume that XA⊥ XB | Y

for all subsets XA,XB of {X1,…,Xn}

Conditional parametrization:

Specify P(Y)

Specify P(Xi | Y)

Joint distribution

Page 14: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

14

Today: Bayesian networksCompact representation of distributions over large

number of variables

(Often) allows efficient exact inference (computing

marginals, etc.)

HailFinder

56 vars

~ 3 states each

�~1026 terms

> 10.000 years

on Top

supercomputersJavaBayes applet

Page 15: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

15

Causal parametrizationGraph with directed edges from (immediate) causes

to (immediate) effects

Earthquake Burglary

Alarm

JohnCalls MaryCalls

Page 16: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

16

Bayesian networksA Bayesian network structure is a directed, acyclic graph G, where each vertex s of G is interpreted as a random variable Xs (with unspecified distribution)

A Bayesian network (G,P) consists of

A BN structure G and ..

..a set of conditional probability distributions (CPTs)P(Xs | PaXs

), where PaXsare the parents of node Xs such that

(G,P) defines joint distribution

Page 17: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

17

Bayesian networksCan every probability distribution be described by a BN?

Page 18: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

18

Representing the world using BNs

Want to make sure that I(P) ⊆ I(P’)

Need to understand CI properties of BN (G,P)

s1 s

2 s3

s4

s5 s

7s6

s11

s12

s9 s

10

s8

s1 s

3

s12

s9

True distribution P’

with cond. ind. I(P’) Bayes net (G,P)

with I(P)

represent

Page 19: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

19

Which kind of CI does a BN imply?

E B

A

J M

Page 20: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

20

Which kind of CI does a BN imply?

E B

A

J M

Page 21: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

21

Local Markov AssumptionEach BN Structure G is associated with the following

conditional independence assumptions

X ⊥ NonDescendentsX | PaX

We write Iloc(G) for these conditional independences

Suppose (G,P) is a Bayesian network representing P

Does it hold that Iloc(G) ⊆ I(P)?

If this holds, we say G is an I-map for P.

Page 22: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

22

Factorization Theorem

s1 s

2 s3

s4

s5 s

7s6

s11

s12

s9 s

10

s8

s1 s

3

s12

s9

Iloc(G) ⊆ I(P)

True distribution P

can be represented exactly as

G is an I-map of P

(independence map)

i.e., P can be represented as

a Bayes net (G,P)

Page 23: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

23

Factorization Theorem

s1 s

2 s3

s4

s5 s

7s6

s11

s12

s9 s

10

s8

s1 s

3

s12

s9

Iloc(G) ⊆ I(P)

G is an I-map of P

(independence map)

True distribution P

can be represented exactly as

a Bayes net (G,P)

Page 24: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

24

Proof: I-Map to factorization

Page 25: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

25

Factorization Theorem

s1 s

2 s3

s4

s5 s

7s6

s11

s12

s9 s

10

s8

s1 s

3

s12

s9

Iloc(G) ⊆ I(P)

G is an I-map of P

(independence map)

True distribution P

can be represented exactly as

a Bayes net (G,P)

Page 26: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

26

The general case

Page 27: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

27

Factorization Theorem

s1 s

2 s3

s4

s5 s

7s6

s11

s12

s9 s

10

s8

s1 s

3

s12

s9

Iloc(G) ⊆ I(P)True distribution P

can be represented exactly as

Bayesian network (G,P)G is an I-map of P

(independence map)

Page 28: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

28

Defining a Bayes NetGiven random variables and known conditional

independences

Pick ordering X1,…,Xn of the variables

For each Xi

Find minimal subset A ⊆{X1,…,Xi-1} such that Xi ⊥ X¬A | A,

where ¬A = {X1,…,Xn} \ A

Specify / learn CPD(Xi | A)

Ordering matters a lot for compactness of

representation! More later this course.

Page 29: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

29

Adding edges doesn’t hurtTheorem:

Let G be an I-Map for P, and G’ be derived from G by

adding an edge. Then G’ is an I-Map of P

(G’ is strictly more expressive than G)

Proof

Page 30: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

30

Additional conditional independencies

BN specifies joint distribution through conditional

parameterization that satisfies Local Markov Property

But we also talked about additional properties of CI

Weak Union, Intersection, Contraction, …

Which additional CI does a particular BN specify?

All CI that can be derived through algebraic operations

Page 31: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

31

What you need to knowBayesian networks

Local Markov property

I-Maps

Factorization Theorem

Page 32: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation

32

TasksSubscribe to Mailing list

https://utils.its.caltech.edu/mailman/listinfo/cs155

Read Koller & Friedman Chapter 3.1-3.3

Form groups and think about class projects. If you

have difficulty finding a group, email Pete Trautman