Guidance: Assignment 3 Part 1

Some Neat Results From Assignment 1

Assignment 1:Negative Examples (Rohit)

Assignment 1:Noisy Observations (Nick)

Z: true feature vectorX: noisy observationX ~ Normal(z, s2)We need to compute P(X|H)

Φ: cumulative density fnof Gaussian

Assignment 1:Noisy Observations (Nick)

Guidance on Assignment 3

Guidance: Assignment 3 Part 1

matlab functions in statistics toolbox

betacdf, betapdf, betarnd, betastat, betafit


You will explore the role of the priors.The Weiss model showed that priors play an important role when

observations are noisy

observations don’t provide strong constraints

there aren’t many observations.


Implement model a bit like Weiss et al. (2002)Goal: infer motion (velocity) of a rigid shape from observations at two instances in time.

Assume distinctive features that make it easy to identify the location of the feature at successive times.

Assignment 2 Guidance

Bx: the x displacement of the blue square (= delta x in one unit of time)

By: the y displacement of the blue squareRx: the x displacement of the red squareRy: the y displacement of the red squareThese observations are corrupted by measurement noise.

 Gaussian, mean zero, std deviation σ

D: direction of motion (up, down, left, right) Assume only possibilities are one unit of motion in any direction

Assignment 2: Generative Model

Same assumptions for Bx, By.

Rx conditioned on D=up isdrawn from aGaussian

Assignment 2 Math

)()|()|()|()|(~),,,|(

)()|,,,(~),,,|(

)()|,,,()()|,,,(),,,|(

),,,()()|,,,(),,,|(

DPDBypDBxpDRypDRxpByBxRyRxDP

DPDByBxRyRxpByBxRyRxDP

ePeByBxRyRxpDPDByBxRyRxpByBxRyRxDP

ByBxRyRxPDPDByBxRyRxpByBxRyRxDP

e

Conditional independence

Assignment 2 Implementation

)...|(~),,,|( dDrxRxpbyBybxBxryRyrxRxdDP

)()|()|()|()|(~),,,|( DPDBypDBxpDRypDRxpByBxRyRxDP

)...,;( 2ddrxGaussian

...)2/)(exp(21 22

2

dd

d

rx

Quiz: do we need worry about the Gaussian density function normalization term?

Introduction To Bayes Nets

(Stuff stolen fromKevin Murphy, UBC, and

Nir Friedman, HUJI)

What Do You Need To Do Probabilistic Inference In A Given Domain?

Joint probability distribution over all variables in domain

Qualitative part Directed acyclic graph(DAG)• Nodes: random vars. • Edges: direct influence

Quantitative part

Set of conditional probability distributions

0.9 0.1e

be

0.2 0.8

0.01 0.990.9 0.1

bebb

e

BE P(A | E,B)Family of Alarm

Earthquake

Radio

Burglary

Alarm

Call

Compact representation of joint probability distributions via conditional independence

Together

Define a unique distribution in a factored form

)|()|(),|()()(),,,,( ACPERPEBAPEPBPRCAEBP

Bayes Nets (a.k.a. Belief Nets)

Figure from N. Friedman

What Is A Bayes Net?

Earthquake

Radio

Burglary

Alarm

Call

A node is conditionally independent of itsancestors given its parents.

E.g., C is conditionally independent of R, E, and Bgiven A

Notation: C? R,B,E | A

Quiz: What sort of parameter reduction do we get?

From 25 – 1 = 31 parameters to 1+1+2+4+2=10

Conditional Distributions Are Flexible

E.g., Earthquake and Burglary might have independent effectson Alarm

A.k.a. noisy-or

where pB and pE are alarm probabilitygiven burglary and earthquake alone

This constraint reduces # free parameters to 8!

Earthquake Burglary

Alarm

B E P(A|B,E)

0 0 0

0 1 pE

1 0 pB

1 1 pE+pB-pEpB

Domain: Monitoring Intensive-Care Patients• 37 variables• 509 parameters …instead of 237

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

A Real Bayes Net: Alarm


More Real-World Bayes Net Applications

“Microsoft’s competitive advantage lies in its expertise in Bayesian networks”-- Bill Gates, quoted in LA Times, 1996

MS Answer Wizards, (printer) troubleshootersMedical diagnosisSpeech recognition (HMMs)Gene sequence/expression analysis Turbocodes (channel coding)

Why Are Bayes Nets Useful?

Factored representation may have exponentially fewer parameters than full joint

Easier inference (lower time complexity)

Less data required for learning (lower sample complexity) Graph structure supports

Modular representation of knowledge

Local, distributed algorithms for inference and learning

Intuitive (possibly causal) interpretation

Strong theory about the nature of cognition or the generative process that produces observed data Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

Reformulating Naïve Bayes As Graphical Model

D

Rx Ry Bx By

),,,(/),,,,(),,,|(

),,,,(),,,(

)|()|()|()|()(),,,,(

ByBxRyRxPByBxRyRxDPByBxRyRxDP

ByBxRyRxDpByBxRyRxp

DBypDBxpDRypDRxpDPByBxRyRxDp

D

Marginalizing over D

Definition of conditional probability

survive

Age Class Gender

Review: Bayes Net

Nodes = random variablesLinks = expression of joint distribution

Compare to full joint distribution by chain rule

Earthquake

Radio

Burglary

Alarm

Call

Bayesian Analysis

Make inferences from data using probability models about quantities we want to predict

E.g., expected age of death given 51 yr old

E.g., latent topics in document

E.g., What direction is the motion? Set up full probability model that characterizes distribution over all quantities (observed and unobserved)

incorporates prior beliefs Condition model on observed data to compute posterior distribution

1.Evaluate fit of model to data

adjust model parameters to achieve better fits

Inference

• Computing posterior probabilities– Probability of hidden events given any evidence

• Most likely explanation– Scenario that explains evidence

• Rational decision making– Maximize expected utility– Value of Information

• Effect of intervention– Causal analysis

Earthquake

Radio

Burglary

Alarm

Call

Radio

Call


Explaining away effect

Conditional Independence

A node is conditionally independentof its ancestors given its parents.

Example?

What about conditionalindependence between variablesthat aren’t directly connected?

e.g., Earthquake and Burglary?

e.g., Burglary and Radio?

Earthquake

Radio

Burglary

Alarm

Call

d-separationCriterion for deciding if nodes are conditionally independent.

A path from node u to node v is d-separated by a node z if the path matches one of these templates:

u z v

u z v

u z v

u z v

z

z

z

observed

unobserved

d-separationThink about d-separation as breaking a chain.If any link on a chain is broken, the whole chain is broken

u z v

u z v

u z v

u z v

z

u

u

u

u

v

v

v

v

x z y

x z y

x z y

x z y

z

d-separation Along Paths

Are u and v d-separated?

u z v

u z v

u z v

u z v

z

u vz z

u vzz

u vzz

d separated

d separated

Not d separated

Conditional Independence

Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z.

E.g.,

u v

z

z

z

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

Sufficiency For Conditional Independence: Markov Blanket

The Markov blanket of node u consists of the parents, children, and children’s parents of u

P(u|MB(u),v) = P(u|MB(u))

u

Probabilistic Models

Probabilistic models

Directed Undirected

Graphical models

Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA

Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models

(Bayesian belief nets) (Markov nets)

Turning A Directed Graphical Model Into An Undirected Model Via Moralization

Moralization: connect all parents of each node and remove arrows

Toy Example Of A Markov Net

X1 X2

X5

X3

X4

e.g., X1 ? X4, X5 | X2, X3Xi ? Xrest | Xnbrs

Potential function

Partition function

Maximal clique: largest subset of vertices such that each pairis connected by an edge

Clique

1 2 3 3

A Real Markov Net

•Estimate P(x1, …, xn | y1, …, yn)• Ψ(xi, yi) = P(yi | xi): local evidence likelihood• Ψ(xi, xj) = exp(-J(xi, xj)): compatibility matrix

Observed pixels

Latent causes

Example Of Image Segmentation With MRFs

Sziranyi et al. (2000)

Graphical Models Are A Useful Formalism

E.g., feedforward neural net with noise, sigmoid belief net

Hidden layer

Input layer

Output layer


E.g., Restricted Boltzmann machine (Hinton) Also known as Harmony network (Smolensky)

Hidden units

Visible units

Graphical Models Are A Useful FormalismE.g., Gaussian Mixture Model


E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence

Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data

Special cases of DBNs include Hidden Markov Models (HMMs) State-space models

Hidden Markov Model (HMM)

Y1 Y3

X1 X2 X3

Y2

Phones/ words

acoustic signal

transitionmatrix

Gaussianobservations

State-Space Model (SSM)/Linear Dynamical System (LDS)

Y1 Y3

X1 X2 X3

Y2

“True” state

Noisy observations

Example: LDS For 2D Tracking

Q3

R1 R3R2

Q1 Q2

X1

X1 X2

X2

X1 X2

y1

y1 y2

y2

y2y1

oo

o o

sparse linear-Gaussian system

Mike’s Project From Last Year

G

X

studenttrial

α

P

δ

problemIRT model


X

studenttrial

L0

T

τ

G S

BKT model


X

γ σ

studenttrial

L0

T

τ

α

P

δ

problem

η

G S

IRT+BKT model

Guidance: Assignment 3 Part 1

Documents

Transcript of Guidance: Assignment 3 Part 1