Guidance: Assignment 3 Part 1
Embed Size (px)
Transcript of Guidance: Assignment 3 Part 1
Some Neat Results From Assignment 1NOTES FROM 2015: distinguish the zs, dont use z-bar1Assignment 1:Negative Examples (Rohit)
Assignment 1:Noisy Observations (Nick)Z: true feature vectorX: noisy observationX ~ Normal(z, s2)We need to compute P(X|H)
: cumulative density fnof Gaussian
Assignment 1:Noisy Observations (Nick)
Guidance on Assignment 3Guidance: Assignment 3 Part 1matlab functions in statistics toolboxbetacdf, betapdf, betarnd, betastat, betafit
Guidance: Assignment 3 Part 2You will explore the role of the priors.The Weiss model showed that priors play an important role whenobservations are noisyobservations dont provide strong constraintsthere arent many observations.
Guidance: Assignment 3 Part 3Implement model a bit like Weiss et al. (2002)Goal: infer motion (velocity) of a rigid shape from observations at two instances in time.Assume distinctive features that make it easy to identify the location of the feature at successive times.
Assignment 2 GuidanceBx: the x displacement of the blue square (= delta x in one unit of time)By: the y displacement of the blue squareRx: the x displacement of the red squareRy: the y displacement of the red squareThese observations are corrupted by measurement noise.Gaussian, mean zero, std deviation D: direction of motion (up, down, left, right)Assume only possibilities are one unit of motion in any direction
Assignment 2: Generative ModelSame assumptions for Bx, By.
Rx conditioned on D=up isdrawn from aGaussianAssignment 2 Math
Conditional independenceAssignment 2 Implementation
Quiz: do we need worry about the Gaussian density function normalization term?Introduction To Bayes Nets(Stuff stolen fromKevin Murphy, UBC, and Nir Friedman, HUJI)What Do You Need To Do Probabilistic Inference In A Given Domain?Joint probability distribution over all variables in domainQualitative part Directed acyclic graph(DAG)Nodes: random vars. Edges: direct influenceQuantitative partSet of conditional probability distributions0.90.1ebe0.20.80.01 0.990.90.1bebbeBEP(A | E,B)Family of AlarmEarthquakeRadioBurglaryAlarmCallCompact representation of joint probability distributions via conditional independenceTogether
Define a unique distribution in a factored form
Bayes Nets (a.k.a. Belief Nets)Figure from N. FriedmanWhat Is A Bayes Net?EarthquakeRadioBurglaryAlarmCall
A node is conditionally independent of itsancestors given its parents.
E.g., C is conditionally independent of R, E, and Bgiven A
Notation: C? R,B,E | AQuiz: What sort of parameter reduction do we get?From 25 1 = 31 parameters to 1+1+2+4+2=10Conditional Distributions Are FlexibleE.g., Earthquake and Burglary might have independent effectson AlarmA.k.a. noisy-or
where pB and pE are alarm probabilitygiven burglary and earthquake aloneThis constraint reduces # free parameters to 8!
BEP(A|B,E)00001pE10pB11pE+pB-pEpBDomain: Monitoring Intensive-Care Patients37 variables509 parameters instead of 237PCWPCOHRBPHREKGHRSATERRCAUTERHRHISTORYCATECHOLSAO2EXPCO2ARTCO2VENTALVVENTLUNGVENITUBEDISCONNECTMINVOLSETVENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUSPAPSHUNTANAPHYLAXISMINOVLPVSATFIO2PRESSINSUFFANESTHTPRLVFAILUREERRBLOWOUTPUTSTROEVOLUMELVEDVOLUMEHYPOVOLEMIACVPBPA Real Bayes Net: AlarmFigure from N. FriedmanMore Real-World Bayes Net ApplicationsMicrosofts competitive advantage lies in its expertise in Bayesian networks-- Bill Gates, quoted in LA Times, 1996MS Answer Wizards, (printer) troubleshootersMedical diagnosisSpeech recognition (HMMs)Gene sequence/expression analysis Turbocodes (channel coding)
Turbo codes scheme for efficient (near shannon limit) encoding of information, used in cellular communication. Turbocodes were reinterpreted as a form of (loopy) belief propagation in Bayes nets by WeissWhy Are Bayes Nets Useful?Factored representation may have exponentially fewer parameters than full jointEasier inference (lower time complexity)Less data required for learning (lower sample complexity)Graph structure supportsModular representation of knowledgeLocal, distributed algorithms for inference and learningIntuitive (possibly causal) interpretationStrong theory about the nature of cognition or the generative process that produces observed dataCant represent arbitrary contingencies among variables, so theory can be rejected by data
Reformulating Nave Bayes As Graphical ModelDRxRyBxBy
Marginalizing over DDefinition of conditional probabilitysurviveAgeClassGenderReview: Bayes NetNodes = random variablesLinks = expression of joint distribution
Compare to full joint distribution by chain ruleEarthquakeRadioBurglaryAlarmCall
REVIEW SLIDE when I started lecture here22Bayesian AnalysisMake inferences from data using probability models about quantities we want to predictE.g., expected age of death given 51 yr oldE.g., latent topics in documentE.g., What direction is the motion?Set up full probability model that characterizes distribution over all quantities (observed and unobserved)incorporates prior beliefsCondition model on observed data to compute posterior distributionEvaluate fit of model to dataadjust model parameters to achieve better fitsInferenceComputing posterior probabilitiesProbability of hidden events given any evidenceMost likely explanationScenario that explains evidenceRational decision makingMaximize expected utilityValue of InformationEffect of interventionCausal analysisEarthquakeRadioBurglaryAlarmCallRadioCallFigure from N. FriedmanExplaining away effectvalue of information: how much would it help to obtain evidence about some variableConditional IndependenceA node is conditionally independentof its ancestors given its parents.Example?What about conditionalindependence between variablesthat arent directly connected?e.g., Earthquake and Burglary?e.g., Burglary and Radio?
example? C indep of E and B given A
[equations] weve already seen some other examples of independence with the formulation of bayes netd-separationCriterion for deciding if nodes are conditionally independent.A path from node u to node v is d-separated by a node z if the path matches one of these templates:
uzvuzvuzvuzvzzzobservedunobservedD-separation: DEPENDENCE separation. Is information carried along path?
Last case: Z is a collision between U and V. 26d-separationThink about d-separation as breaking a chain.If any link on a chain is broken, the whole chain is brokenuzvuzvuzvuzvzuuuuvvvvxzyxzyxzyxzyzD-separation: DEPENDENCE separation. Is information carried along path?
Last case: Z is a collision between U and V. 27d-separation Along PathsAre u and v d-separated?uzvuzvuzvuzvzuvzzuvzzuvzzd separatedd separatedNot d separatedQUIZ: Yes, Yes, and No28Conditional IndependenceNodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z.E.g., uvzzzPCWPCOHRBPHREKGHRSATERRCAUTERHRHISTORYCATECHOLSAO2EXPCO2ARTCO2VENTALVVENTLUNGVENITUBEDISCONNECTMINVOLSETVENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUSPAPSHUNTANAPHYLAXISMINOVLPVSATFIO2PRESSINSUFFANESTHTPRLVFAILUREERRBLOWOUTPUTSTROEVOLUMELVEDVOLUMEHYPOVOLEMIACVPBPPaths: shunt-intubation-ventalvShunt-sage-pvsat-ventalv30PCWPCOHRBPHREKGHRSATERRCAUTERHRHISTORYCATECHOLSAO2EXPCO2ARTCO2VENTALVVENTLUNGVENITUBEDISCONNECTMINVOLSETVENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUSPAPSHUNTANAPHYLAXISMINOVLPVSATFIO2PRESSINSUFFANESTHTPRLVFAILUREERRBLOWOUTPUTSTROEVOLUMELVEDVOLUMEHYPOVOLEMIACVPBPSufficiency For Conditional Independence: Markov BlanketThe Markov blanket of node u consists of the parents, children, and childrens parents of u
P(u|MB(u),v) = P(u|MB(u))
uW can infer this from the d-separation property each node outside markov blanket is d-separated from u32Probabilistic ModelsProbabilistic modelsDirectedUndirectedGraphical modelsAlarm networkState-space modelsHMMsNave Bayes classifierPCA/ ICAMarkov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models(Bayesian belief nets)(Markov nets)BAYESIAN (BELIEF) NET vs. MARKOV NETBayesian belief nets are a subset of Markov netsAny Bayesian net can be turned into a Markov net; the reverse is not trueTurning A Directed Graphical Model Into An Undirected Model Via MoralizationMoralization: connect all parents of each node and remove arrows
Toy Example Of A Markov NetX1X2X5X3X4e.g., X1 ? X4, X5 | X2, X3Xi ? Xrest | XnbrsPotential functionPartition functionMaximal clique: largest subset of vertices such that each pairis connected by an edgeClique
1233Arbitrary potential functions for each clique. Functions can have interactions among terms or just products involving individual terms, i.e., psi(x4,x5) could equal psi(x4)psi(x5).
Conditional indpendence: two nodes are indpendent conditional on evidence if every path between nodes is cut off by evidence.X1s neighbors are given so that means x1 is cut off from all other nodes. For this example, Only x3 needs to be known to block x1 from x4, x5.
A Real Markov Net
Estimate P(x1, , xn | y1, , yn) (xi, yi) = P(yi | xi): local evidence likelihood (xi, xj) = exp(-J(xi, xj)): compatibility matrixObserved pixelsLatent causesExample Of Image Segmentation With MRFs
Sziranyi et al. (2000)Graphical Models Are A Useful FormalismE.g., feedforward neural netwith noise, sigmoid belief net
Hidden layerInput layerOutput layerGraphical Models Are A Useful FormalismE.g., Restricted Boltzmann machine (Hinton)Also known as Harmony network (Smolensky)
Hidden unitsVisible units
Graphical Models Are A Useful FormalismE.g., Gaussian Mixture ModelGraphical Models Are A Useful FormalismE.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence