Guidance: Assignment 3 Part 1

Some Neat Results From Assignment 1

Assignment 1:Negative Examples (Rohit)

Assignment 1:Noisy Observations (Nick)

Z: true feature vectorX: noisy observationX ~ Normal(z, s2)We need to compute P(X|H)

Φ: cumulative density fnof Gaussian

Assignment 1:Noisy Observations (Nick)

Guidance on Assignment 3

Guidance: Assignment 3 Part 1

matlab functions in statistics toolbox

betacdf, betapdf, betarnd, betastat, betafit

You will explore the role of the priors.The Weiss model showed that priors play an important role when

observations are noisy

observations don’t provide strong constraints

there aren’t many observations.

Implement model a bit like Weiss et al. (2002)Goal: infer motion (velocity) of a rigid shape from observations at two instances in time.

Assume distinctive features that make it easy to identify the location of the feature at successive times.

Assignment 2 Guidance

Bx: the x displacement of the blue square (= delta x in one unit of time)

By: the y displacement of the blue squareRx: the x displacement of the red squareRy: the y displacement of the red squareThese observations are corrupted by measurement noise.

 Gaussian, mean zero, std deviation σ

D: direction of motion (up, down, left, right) Assume only possibilities are one unit of motion in any direction

Assignment 2: Generative Model

Same assumptions for Bx, By.

Rx conditioned on D=up isdrawn from aGaussian

Assignment 2 Math

)()|()|()|()|(~),,,|(

)()|,,,(~),,,|(

)()|,,,()()|,,,(),,,|(

),,,()()|,,,(),,,|(

DPDBypDBxpDRypDRxpByBxRyRxDP

DPDByBxRyRxpByBxRyRxDP

ePeByBxRyRxpDPDByBxRyRxpByBxRyRxDP

ByBxRyRxPDPDByBxRyRxpByBxRyRxDP

Conditional independence

Assignment 2 Implementation

)...|(~),,,|( dDrxRxpbyBybxBxryRyrxRxdDP

)()|()|()|()|(~),,,|( DPDBypDBxpDRypDRxpByBxRyRxDP

)...,;( 2ddrxGaussian

...)2/)(exp(21 22

Quiz: do we need worry about the Gaussian density function normalization term?

Introduction To Bayes Nets

(Stuff stolen fromKevin Murphy, UBC, and

Nir Friedman, HUJI)

What Do You Need To Do Probabilistic Inference In A Given Domain?

Joint probability distribution over all variables in domain

Qualitative part Directed acyclic graph(DAG)• Nodes: random vars. • Edges: direct influence

Quantitative part

Set of conditional probability distributions

0.9 0.1e

0.2 0.8

0.01 0.990.9 0.1

BE P(A | E,B)Family of Alarm

Earthquake

Burglary

Compact representation of joint probability distributions via conditional independence

Together

Define a unique distribution in a factored form

)|()|(),|()()(),,,,( ACPERPEBAPEPBPRCAEBP

Bayes Nets (a.k.a. Belief Nets)

Figure from N. Friedman

What Is A Bayes Net?

Earthquake

Burglary

A node is conditionally independent of itsancestors given its parents.

E.g., C is conditionally independent of R, E, and Bgiven A

Notation: C? R,B,E | A

Quiz: What sort of parameter reduction do we get?

From 25 – 1 = 31 parameters to 1+1+2+4+2=10

Conditional Distributions Are Flexible

E.g., Earthquake and Burglary might have independent effectson Alarm

A.k.a. noisy-or

where pB and pE are alarm probabilitygiven burglary and earthquake alone

This constraint reduces # free parameters to 8!

Earthquake Burglary

B E P(A|B,E)

0 1 pE

1 0 pB

1 1 pE+pB-pEpB

Domain: Monitoring Intensive-Care Patients• 37 variables• 509 parameters …instead of 237

PCWP CO

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

A Real Bayes Net: Alarm

More Real-World Bayes Net Applications

“Microsoft’s competitive advantage lies in its expertise in Bayesian networks”-- Bill Gates, quoted in LA Times, 1996

MS Answer Wizards, (printer) troubleshootersMedical diagnosisSpeech recognition (HMMs)Gene sequence/expression analysis Turbocodes (channel coding)

Why Are Bayes Nets Useful?

Factored representation may have exponentially fewer parameters than full joint

Easier inference (lower time complexity)

Less data required for learning (lower sample complexity) Graph structure supports

Modular representation of knowledge

Local, distributed algorithms for inference and learning

Intuitive (possibly causal) interpretation

Strong theory about the nature of cognition or the generative process that produces observed data Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

Reformulating Naïve Bayes As Graphical Model

Rx Ry Bx By

),,,(/),,,,(),,,|(

),,,,(),,,(

)|()|()|()|()(),,,,(

ByBxRyRxPByBxRyRxDPByBxRyRxDP

ByBxRyRxDpByBxRyRxp

DBypDBxpDRypDRxpDPByBxRyRxDp

Marginalizing over D

Definition of conditional probability

survive

Age Class Gender

Review: Bayes Net

Nodes = random variablesLinks = expression of joint distribution

Compare to full joint distribution by chain rule

Earthquake

Burglary

Bayesian Analysis

Make inferences from data using probability models about quantities we want to predict

E.g., expected age of death given 51 yr old

E.g., latent topics in document

E.g., What direction is the motion? Set up full probability model that characterizes distribution over all quantities (observed and unobserved)

incorporates prior beliefs Condition model on observed data to compute posterior distribution

1.Evaluate fit of model to data

adjust model parameters to achieve better fits

Inference

• Computing posterior probabilities– Probability of hidden events given any evidence

• Most likely explanation– Scenario that explains evidence

• Rational decision making– Maximize expected utility– Value of Information

• Effect of intervention– Causal analysis

Earthquake

Burglary

Explaining away effect

Conditional Independence

A node is conditionally independentof its ancestors given its parents.

Example?

What about conditionalindependence between variablesthat aren’t directly connected?

e.g., Earthquake and Burglary?

e.g., Burglary and Radio?

Earthquake

Burglary

d-separationCriterion for deciding if nodes are conditionally independent.

A path from node u to node v is d-separated by a node z if the path matches one of these templates:

observed

unobserved

d-separationThink about d-separation as breaking a chain.If any link on a chain is broken, the whole chain is broken

d-separation Along Paths

Are u and v d-separated?

u vz z

d separated

Not d separated

Conditional Independence

Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z.

PCWP CO

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

PAP SHUNT

ANAPHYLAXIS

MINOVL

INSUFFANESTHTPR

LVFAILURE

HYPOVOLEMIA

PCWP CO

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

PAP SHUNT

ANAPHYLAXIS

MINOVL

INSUFFANESTHTPR

LVFAILURE

HYPOVOLEMIA

Sufficiency For Conditional Independence: Markov Blanket

The Markov blanket of node u consists of the parents, children, and children’s parents of u

P(u|MB(u),v) = P(u|MB(u))

Probabilistic Models

Probabilistic models

Directed Undirected

Graphical models

Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA

Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models

(Bayesian belief nets) (Markov nets)

Turning A Directed Graphical Model Into An Undirected Model Via Moralization

Moralization: connect all parents of each node and remove arrows

Toy Example Of A Markov Net

e.g., X1 ? X4, X5 | X2, X3Xi ? Xrest | Xnbrs

Potential function

Partition function

Maximal clique: largest subset of vertices such that each pairis connected by an edge

Clique

1 2 3 3

A Real Markov Net

•Estimate P(x1, …, xn | y1, …, yn)• Ψ(xi, yi) = P(yi | xi): local evidence likelihood• Ψ(xi, xj) = exp(-J(xi, xj)): compatibility matrix

Observed pixels

Latent causes

Example Of Image Segmentation With MRFs

Sziranyi et al. (2000)

Graphical Models Are A Useful Formalism

E.g., feedforward neural net with noise, sigmoid belief net

Hidden layer

Input layer

Output layer

E.g., Restricted Boltzmann machine (Hinton) Also known as Harmony network (Smolensky)

Hidden units

Visible units

Graphical Models Are A Useful FormalismE.g., Gaussian Mixture Model

E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence

Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data

Special cases of DBNs include Hidden Markov Models (HMMs) State-space models

Hidden Markov Model (HMM)

X1 X2 X3

Phones/ words

acoustic signal

transitionmatrix

Gaussianobservations

State-Space Model (SSM)/Linear Dynamical System (LDS)

X1 X2 X3

“True” state

Noisy observations

Example: LDS For 2D Tracking

R1 R3R2

sparse linear-Gaussian system

Kalman Filtering(Recursive State Estimation In An LDS)

X1 X2X3

Estimate P(Xt|y1:t) from P(Xt-1|y1:t-1) and yt

Mike’s Project From Last Year

studenttrial

problemIRT model

studenttrial

BKT model

studenttrial

problem

IRT+BKT model

Guidance: Assignment 3 Part 1

Documents

Transcript of Guidance: Assignment 3 Part 1

6th Assignment Control Systems Ahmedawad

OB individual Assignment

Career Guidance

Statistics Assignment 2-2

Assignment Eight: Surface Hyperbolic Structure

assignment 3 solution - my.ece.utah.edu

Assignment Solution Chapter 3 - site.iugaza.edu.pssite.iugaza.edu.ps/ilubbad/files/2017/02/Assignment_Solution.pdf · Assignment Solution Chapter 3 ... ECOM5060:Compiler Design Eng.

Homework Assignment 4 Canonical Transformations 1 ...

EGESIF 14-0012-02 Guidance on Management Verifications EL

Phys 514 - Assignment 6 Solutions - McGill University

EECS 117 Homework Assignment 2rfic.eecs.berkeley.edu/~niknejad/ee117/pdf/hw2sol.pdfEECS 117 Homework Assignment 2 Spring 2004 1(a). ... has a point lying on the left hand part along

Blended learning assignment 1

Static Single Assignment Form

MATH 345: Assignment 3 Solutionsyxli/m345/m345_hw3_solns.pdf · MATH 345: Assignment 3 Solutions ˚_ = ; ... (For this assignment solution, the gures for 1c), 1f), and 2b) were generated

ISRRT Who QA Guidance for General Radiography

Guidance Doc 852-2004 El

EGESIF 14-0011-02 Guidance on Audit Strategy EL

Assignment 5 - WordPress.com · Assignment 5 Tyler Shendruk April 4, 2010 1 Tyler Problem 1 1.1 Part a) Entropy N hard disks in a box. Each disk excludes an area !. There is hard-core

The Future Museum - Final assignment

Assignment 6