A New Scoring Function for Bayesian Network Structure ......2. Bayesian Networks (BNs) and Structure...

SBIC+SB (G |D) = logP(D |G)−logn2

G + maxC∈ΩijEij∉G

∑ minc∈val (C )

sb(Xi,Xj |D,C = c)

Bayesian Information Criterion (BIC) SparsityBoost (SB)

P(S|R)!Sprinkler!T! F!

Rain!F! 0.2! 0.8!T! 0.01! 0.99!

•  A BN is a graph G that represents a probability distribution with one node per variable, and one conditional probability distribution (CPD) per node, � �

•  Structure learning as discrete� optimization: �

A New Scoring Function for Bayesian Network Structure Learning Extended to Arbitrary Discrete Variables

Rachel Hodos1, David Sontag2 1. Computational Biology program, NYU, 2. Computer Science Dept., NYU

Learning Task: Given n observations of m variables, learn the Bayesian network structure which generated the data.�

1. Introduction�

Focus of this work: Devise a score that makes structure learning easier with more data, and that works for discrete variables with any number of states.�

Input: Data, D�X1! X2!

X1!X2!X3!X4!X5!0! 2! 1! 0! 2!1! 1! 0! 0! 0!

1! 0! 1! 1! 2!

Output: Graph, G�

2. Bayesian Networks (BNs) and Structure Learning�

P(Xi | Pa(Xi ))

P(R)!Rain !

T! F!0.3! 0.7!

P(G|S,R)! Grass Wet!S! R! T! F!T! T! 0.99! 0.01!T! F! 0.9! 0.1!F! T! 0.8! 0.2!F! F! 0! 1!

Grass !Wet!

A simple example:�•  But what score to use?�

�•  Maximum likelihood estimation,

S = P(D|G), gives complete graph, which is useless!�

•  Hence we need to encourage sparsity�

G* = argmaxG

S(G |D)

P(X1,X2,…,Xn ) = P(Xi | Pa(Xi ))i=1

�generic �

score� function�

Sprinkler!

3. A New Score: SparsityBoost�

•  PROBLEM: Existing complexity penalties are data agnostic, causing the score to be more difficult to optimize with more data�

•  IDEA: Add a data-dependent term that boosts sparsity.�•  HOW: Search for evidence that an edge should not be present,

and boost score of any graph that does not contain that edge. �

Data-agnostic complexity penalty, |G| = # of parameters of G�Large for strong evidence of independence, small otherwise�Ωij is a set of conditioning sets, e.g. all subsets of size ≤ 2, excluding Xi and Xj�

Want strongest evidence for each edge, so take max over conditioning sets�Independence should hold for all values of conditioning set�

1! 2!3!

4. Bayesian Independence Testing�

sb(Xi,Xj |D,C = c) =“Conditioning on C=c, how strongly do the

data show that Xi is indep. of Xj?”�

� •  Use conditional mutual information, MI�

•  Assuming uniform prior over joint distributions, derive posterior, p(MI | D)�

MI(P(Xi,Xj | c)) = P(xi, x j | c)logP(xi, x j | c)

P(xi | c)P(x j | c)xi ,x j

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

80Prior over I (n=0)

I (mutual information)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

80Posterior over I with n=800

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

80Prior over I (n=0)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

P(MI | D), �D from indep.�distribution�

P(MI | D),�D from dep.�distribution�

n increasing�

sb(Xi,Xj |D,C = c) = − log p(MI(P(Xi,Xj |C = c)) |D )η

∫ dp

threshold, η

[1] Hutter, M., Zaffalon, M., Distribution of Mutual Information, Computational Statistics and Data Analysis, Vol. 48, No. 3, March 2005, pages 633-657.��[2] Brenner, E., Sontag, D., SparsityBoost: A New Scoring Function for Learning Bayesian Network Structure, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, July 2013.��[3] Beinlich, I. A., Suermondt, H. J., Chavez, R.M., & Cooper, G. F. 1989. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. Pages 247-256 of: Proceedings of the 2nd European Conference on Articial Intelligence in Medicine. Springer-Verlag.��[4] Cussens, J. Bayesian network learning with cutting planes. In Fabio G. Cozman and Avi Pfeffer, editors, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 153-160, Barcelona, 2011, AUAI Press.

6. References� 5. Results on Synthetic Data�

0 1000 2000 3000 4000 5000 6000 7000 80000

Number of samples

BICBIC+SB1BIC+SB2

0 500 1000 1500 2000 2500 3000 3500 40000

Number of samples

BICBIC+SB1BIC+SB2

•  Start with known network structure (‘Alarm’ network)�•  Generate synthetic data (only binary data shown)�•  Find globally optimal structure with respect to score�•  Each point shown is average of ten independent experiments�•  Both accuracy and runtime improved�

Error� Runtime�

A New Scoring Function for Bayesian Network Structure ......2. Bayesian Networks (BNs) and Structure...

Documents

Transcript of A New Scoring Function for Bayesian Network Structure ......2. Bayesian Networks (BNs) and Structure...

1 . $ % ' . ( + , * - . - $ * ' ( 3 ( · 2020. 12. 9. · . o m n m n m \ d p d p s j n i [ o x p d r d q = N _ s w ä r w ä t r t r ' B I Z R D Q P ` J T X K = J B R G Q P S P R

Bayesian Inference, Basics - Stony Brookzhu/ams570/Bayesian_Basics.pdf · Bayesian Inference In Bayesian inference there is a fundamental distinction between • Observable quantities

Antennen - informationsuebertragung.ch · 3333 4 42 2 r TX G P P S D D r r = ⋅ = ⋅η⋅ ⋅π⋅ ⋅π⋅. (1) Bei gegebener Sendeleistung ist die Intensität der elektromagnetischen

Inverse Problems: From Regularization to Bayesian Inferencedjafari.free.fr/pdf/UCLA_seminar.pdf · Inverse Problems: From Regularization to Bayesian Inference ... g(r,φ) = −ln

Lecture 17 Bayesian Econometrics · PDF fileBayesian Econometrics Bayesian Econometrics: Introduction • Idea: ... • The typical problem in Bayesian statistics involves obtaining

REVIEW. Membrane Theory Calculation of R 1 and R 2 1.Cylindrate shell R 1 = ∞ R 2 = D / 2 δ D p.

Logique - Thomas Pietrzak Prolog Séquent Coq. Buts p et q : preuves de P et Q On peut voir la preuve de R avec : Show Proof p : P q : Q _____ R Coq. Tactiques. exact Axiome

Lecture 17 – Part 1 Bayesian Econometrics 1 Lecture 17 – Part 1 Bayesian Econometrics Bayesian Econometrics: Introduction • Idea: We are not estimating a parameter value, ...

Bayesian Regression & Classiﬁcation · Bayesian Regression & Classiﬁcation learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic

I 20 aminoacidi delle proteine.. Modificazioni covalenti delle catene laterale degli aminoacidi. NB: R-OH R -O-P i Ser-O-P i Thr-O-P i Tyr-O-P i R-

Doubly Robust Bayesian Inference for Non-Stationary ... · 2.1 General Bayesian Inference (GBI) with -Divergences ( -D) Standard Bayesian inference minimizes the KLD between the Data

Introduction to Bayesian Statistics - 3milotti/Didattica/... · Introduction to Bayesian Statistics - 3 Edoardo Milotti Università di Trieste and INFN-Sezione di Trieste . Bayesian(inference(and(maximum0likelihood

with: r) . ofstad - staff.uni-mainz.de fileof rm s ˆτ n (k = X x ∈ Z d τ n (x) e ik · x k ∈ [− π] d. r p c, ˆτ n (0) is small. r p c, ˆτ n (0) s n d. r p = p c, iour

Bayesian Biostatistics Using BUGSjbn/courses/bugs2/... · Bayesian Biostatistics Using BUGS (3) 3.13 Bayesian Biostatistics Using BUGS 5.4. ΠΑΡΑ∆ΕΙΓΜΑΤΑ ΣΤΟ BUGS Department

MAGNETISME ET RMN - scinti.edu.umontpellier.frscinti.edu.umontpellier.fr/files/2013/10/Rappels-de-RMN.pdf · q P r v r r u r 2 o r P o P r u q π B H r r r∧ r = = v 4 µ µ [T,

Stochastic Volatility Models: Bayesian Framework

Bayesian Optimization with Exponential Convergencepapers.nips.cc/paper/5715-bayesian-optimization-with... · 2015-12-18 · Bayesian Optimization with Exponential Convergence Kenji

Bayesian and frequentist inference

P R O T E C T I O N · r a i l w a y. ΚΕΝΤΡΟ ΜΕΛΕΤΩΝ ΑΣΦΑΛΕΙΑΣ center for security studies c r i t i c a l. e n e r g y t r a n s p o r t a i r p o r t

Bayesian Graphical Models