Guidance: Assignment 3 Part 1

48
Some Neat Results From Assignment 1

description

Guidance: Assignment 3 Part 1. matlab functions in statistics toolbox betacdf , betapdf , betarnd , betastat , betafit. Guidance: Assignment 3 Part 2. You will explore the role of the priors. The Weiss model showed that priors play an important role when observations are noisy - PowerPoint PPT Presentation

Transcript of Guidance: Assignment 3 Part 1

Page 1: Guidance: Assignment 3 Part 1

Some Neat Results From Assignment 1

Page 2: Guidance: Assignment 3 Part 1

Assignment 1:Negative Examples (Rohit)

Page 3: Guidance: Assignment 3 Part 1

Assignment 1:Noisy Observations (Nick)

Z: true feature vectorX: noisy observationX ~ Normal(z, s2)We need to compute P(X|H)

Φ: cumulative density fnof Gaussian

Page 4: Guidance: Assignment 3 Part 1

Assignment 1:Noisy Observations (Nick)

Page 5: Guidance: Assignment 3 Part 1

Guidance on Assignment 3

Page 6: Guidance: Assignment 3 Part 1

Guidance: Assignment 3 Part 1

matlab functions in statistics toolbox

betacdf, betapdf, betarnd, betastat, betafit

Page 7: Guidance: Assignment 3 Part 1

Guidance: Assignment 3 Part 2

You will explore the role of the priors.The Weiss model showed that priors play an important role when

observations are noisy

observations don’t provide strong constraints

there aren’t many observations.

Page 8: Guidance: Assignment 3 Part 1

Guidance: Assignment 3 Part 3

Implement model a bit like Weiss et al. (2002)Goal: infer motion (velocity) of a rigid shape from observations at two instances in time.

Assume distinctive features that make it easy to identify the location of the feature at successive times.

Page 9: Guidance: Assignment 3 Part 1

Assignment 2 Guidance

Bx: the x displacement of the blue square (= delta x in one unit of time)

By: the y displacement of the blue squareRx: the x displacement of the red squareRy: the y displacement of the red squareThese observations are corrupted by measurement noise.

 Gaussian, mean zero, std deviation σ

D: direction of motion (up, down, left, right) Assume only possibilities are one unit of motion in any direction

Page 10: Guidance: Assignment 3 Part 1

Assignment 2: Generative Model

Same assumptions for Bx, By.

Rx conditioned on D=up isdrawn from aGaussian

Page 11: Guidance: Assignment 3 Part 1

Assignment 2 Math

)()|()|()|()|(~),,,|(

)()|,,,(~),,,|(

)()|,,,()()|,,,(),,,|(

),,,()()|,,,(),,,|(

DPDBypDBxpDRypDRxpByBxRyRxDP

DPDByBxRyRxpByBxRyRxDP

ePeByBxRyRxpDPDByBxRyRxpByBxRyRxDP

ByBxRyRxPDPDByBxRyRxpByBxRyRxDP

e

Conditional independence

Page 12: Guidance: Assignment 3 Part 1

Assignment 2 Implementation

)...|(~),,,|( dDrxRxpbyBybxBxryRyrxRxdDP

)()|()|()|()|(~),,,|( DPDBypDBxpDRypDRxpByBxRyRxDP

)...,;( 2ddrxGaussian

...)2/)(exp(21 22

2

dd

d

rx

Quiz: do we need worry about the Gaussian density function normalization term?

Page 13: Guidance: Assignment 3 Part 1

Introduction To Bayes Nets

(Stuff stolen fromKevin Murphy, UBC, and

Nir Friedman, HUJI)

Page 14: Guidance: Assignment 3 Part 1

What Do You Need To Do Probabilistic Inference In A Given Domain?

Joint probability distribution over all variables in domain

Page 15: Guidance: Assignment 3 Part 1

Qualitative part Directed acyclic graph(DAG)• Nodes: random vars. • Edges: direct influence

Quantitative part

Set of conditional probability distributions

0.9 0.1e

be

0.2 0.8

0.01 0.990.9 0.1

bebb

e

BE P(A | E,B)Family of Alarm

Earthquake

Radio

Burglary

Alarm

Call

Compact representation of joint probability distributions via conditional independence

Together

Define a unique distribution in a factored form

)|()|(),|()()(),,,,( ACPERPEBAPEPBPRCAEBP

Bayes Nets (a.k.a. Belief Nets)

Figure from N. Friedman

Page 16: Guidance: Assignment 3 Part 1

What Is A Bayes Net?

Earthquake

Radio

Burglary

Alarm

Call

A node is conditionally independent of itsancestors given its parents.

E.g., C is conditionally independent of R, E, and Bgiven A

Notation: C? R,B,E | A

Quiz: What sort of parameter reduction do we get?

From 25 – 1 = 31 parameters to 1+1+2+4+2=10

Page 17: Guidance: Assignment 3 Part 1

Conditional Distributions Are Flexible

E.g., Earthquake and Burglary might have independent effectson Alarm

A.k.a. noisy-or

where pB and pE are alarm probabilitygiven burglary and earthquake alone

This constraint reduces # free parameters to 8!

Earthquake Burglary

Alarm

B E P(A|B,E)

0 0 0

0 1 pE

1 0 pB

1 1 pE+pB-pEpB

Page 18: Guidance: Assignment 3 Part 1

Domain: Monitoring Intensive-Care Patients• 37 variables• 509 parameters …instead of 237

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

A Real Bayes Net: Alarm

Figure from N. Friedman

Page 19: Guidance: Assignment 3 Part 1

More Real-World Bayes Net Applications

“Microsoft’s competitive advantage lies in its expertise in Bayesian networks”-- Bill Gates, quoted in LA Times, 1996

MS Answer Wizards, (printer) troubleshootersMedical diagnosisSpeech recognition (HMMs)Gene sequence/expression analysis Turbocodes (channel coding)

Page 20: Guidance: Assignment 3 Part 1

Why Are Bayes Nets Useful?

Factored representation may have exponentially fewer parameters than full joint

Easier inference (lower time complexity)

Less data required for learning (lower sample complexity) Graph structure supports

Modular representation of knowledge

Local, distributed algorithms for inference and learning

Intuitive (possibly causal) interpretation

Strong theory about the nature of cognition or the generative process that produces observed data Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

Page 21: Guidance: Assignment 3 Part 1

Reformulating Naïve Bayes As Graphical Model

D

Rx Ry Bx By

),,,(/),,,,(),,,|(

),,,,(),,,(

)|()|()|()|()(),,,,(

ByBxRyRxPByBxRyRxDPByBxRyRxDP

ByBxRyRxDpByBxRyRxp

DBypDBxpDRypDRxpDPByBxRyRxDp

D

Marginalizing over D

Definition of conditional probability

survive

Age Class Gender

Page 22: Guidance: Assignment 3 Part 1

Review: Bayes Net

Nodes = random variablesLinks = expression of joint distribution

Compare to full joint distribution by chain rule

Earthquake

Radio

Burglary

Alarm

Call

Page 23: Guidance: Assignment 3 Part 1

Bayesian Analysis

Make inferences from data using probability models about quantities we want to predict

E.g., expected age of death given 51 yr old

E.g., latent topics in document

E.g., What direction is the motion? Set up full probability model that characterizes distribution over all quantities (observed and unobserved)

incorporates prior beliefs Condition model on observed data to compute posterior distribution

1.Evaluate fit of model to data

adjust model parameters to achieve better fits

Page 24: Guidance: Assignment 3 Part 1

Inference

• Computing posterior probabilities– Probability of hidden events given any evidence

• Most likely explanation– Scenario that explains evidence

• Rational decision making– Maximize expected utility– Value of Information

• Effect of intervention– Causal analysis

Earthquake

Radio

Burglary

Alarm

Call

Radio

Call

Figure from N. Friedman

Explaining away effect

Page 25: Guidance: Assignment 3 Part 1

Conditional Independence

A node is conditionally independentof its ancestors given its parents.

Example?

What about conditionalindependence between variablesthat aren’t directly connected?

e.g., Earthquake and Burglary?

e.g., Burglary and Radio?

Earthquake

Radio

Burglary

Alarm

Call

Page 26: Guidance: Assignment 3 Part 1

d-separationCriterion for deciding if nodes are conditionally independent.

A path from node u to node v is d-separated by a node z if the path matches one of these templates:

u z v

u z v

u z v

u z v

z

z

z

observed

unobserved

Page 27: Guidance: Assignment 3 Part 1

d-separationThink about d-separation as breaking a chain.If any link on a chain is broken, the whole chain is broken

u z v

u z v

u z v

u z v

z

u

u

u

u

v

v

v

v

x z y

x z y

x z y

x z y

z

Page 28: Guidance: Assignment 3 Part 1

d-separation Along Paths

Are u and v d-separated?

u z v

u z v

u z v

u z v

z

u vz z

u vzz

u vzz

d separated

d separated

Not d separated

Page 29: Guidance: Assignment 3 Part 1

Conditional Independence

Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z.

E.g.,

u v

z

z

z

Page 30: Guidance: Assignment 3 Part 1

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

Page 31: Guidance: Assignment 3 Part 1

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

Page 32: Guidance: Assignment 3 Part 1

Sufficiency For Conditional Independence: Markov Blanket

The Markov blanket of node u consists of the parents, children, and children’s parents of u

P(u|MB(u),v) = P(u|MB(u))

u

Page 33: Guidance: Assignment 3 Part 1

Probabilistic Models

Probabilistic models

Directed Undirected

Graphical models

Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA

Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models

(Bayesian belief nets) (Markov nets)

Page 34: Guidance: Assignment 3 Part 1

Turning A Directed Graphical Model Into An Undirected Model Via Moralization

Moralization: connect all parents of each node and remove arrows

Page 35: Guidance: Assignment 3 Part 1

Toy Example Of A Markov Net

X1 X2

X5

X3

X4

e.g., X1 ? X4, X5 | X2, X3Xi ? Xrest | Xnbrs

Potential function

Partition function

Maximal clique: largest subset of vertices such that each pairis connected by an edge

Clique

1 2 3 3

Page 36: Guidance: Assignment 3 Part 1

A Real Markov Net

•Estimate P(x1, …, xn | y1, …, yn)• Ψ(xi, yi) = P(yi | xi): local evidence likelihood• Ψ(xi, xj) = exp(-J(xi, xj)): compatibility matrix

Observed pixels

Latent causes

Page 37: Guidance: Assignment 3 Part 1

Example Of Image Segmentation With MRFs

Sziranyi et al. (2000)

Page 38: Guidance: Assignment 3 Part 1

Graphical Models Are A Useful Formalism

E.g., feedforward neural net with noise, sigmoid belief net

Hidden layer

Input layer

Output layer

Page 39: Guidance: Assignment 3 Part 1

Graphical Models Are A Useful Formalism

E.g., Restricted Boltzmann machine (Hinton) Also known as Harmony network (Smolensky)

Hidden units

Visible units

Page 40: Guidance: Assignment 3 Part 1

Graphical Models Are A Useful FormalismE.g., Gaussian Mixture Model

Page 41: Guidance: Assignment 3 Part 1

Graphical Models Are A Useful Formalism

E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence

Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data

Special cases of DBNs include Hidden Markov Models (HMMs) State-space models

Page 42: Guidance: Assignment 3 Part 1

Hidden Markov Model (HMM)

Y1 Y3

X1 X2 X3

Y2

Phones/ words

acoustic signal

transitionmatrix

Gaussianobservations

Page 43: Guidance: Assignment 3 Part 1

State-Space Model (SSM)/Linear Dynamical System (LDS)

Y1 Y3

X1 X2 X3

Y2

“True” state

Noisy observations

Page 44: Guidance: Assignment 3 Part 1

Example: LDS For 2D Tracking

Q3

R1 R3R2

Q1 Q2

X1

X1 X2

X2

X1 X2

y1

y1 y2

y2

y2y1

oo

o o

sparse linear-Gaussian system

Page 45: Guidance: Assignment 3 Part 1

Kalman Filtering(Recursive State Estimation In An LDS)

Y1 Y3

X1 X2X3

Y2

Estimate P(Xt|y1:t) from P(Xt-1|y1:t-1) and yt

•Predict: P(Xt|y1:t-1) = sXt-1 P(Xt|Xt-1) P(Xt-1|y1:t-1)•Update: P(Xt|y1:t) / P(yt|Xt) P(Xt|y1:t-1)

Page 46: Guidance: Assignment 3 Part 1

Mike’s Project From Last Year

G

X

studenttrial

α

P

δ

problemIRT model

Page 47: Guidance: Assignment 3 Part 1

Mike’s Project From Last Year

X

studenttrial

L0

T

τ

G S

BKT model

Page 48: Guidance: Assignment 3 Part 1

Mike’s Project From Last Year

X

γ σ

studenttrial

L0

T

τ

α

P

δ

problem

η

G S

IRT+BKT model