Inference in Sparse Graphs with Pairwise ... - Dylan Foster · Dylan Foster, Daniel Reichman, and...

Post on 08-Oct-2020

9 views 0 download

Transcript of Inference in Sparse Graphs with Pairwise ... - Dylan Foster · Dylan Foster, Daniel Reichman, and...

Some Concrete Results

• If G connected, Error = O(p|V |).• For grids of size × n, optimal Error = Θ(p2n).

Until c ≤ 2, then O(pn) optimal.

• √n×

√n grid: We recover O(p2n), but like so:

Tree decompositionsFor graph G = (V,E), let W = W1, . . . ,WN be a collection ofsubsets of V and let T = (W , F ) and be a tree graph over W.T isatreedecompositionif

1. !W∈W W = V .2. For each uv ∈ E, some W ∈ W has u, v ∈ W .3. For W1,W2,W3 ∈ W , if W2 on path from W1 to W3, need

W1 ∩W3 ⊆ W2.Maintheorem: RecoveryfromtreedecompositionSuppose we have

• G′ = (V,E′), E′ ⊆ E.• Tree decomp. T = (W, F ) for G′ w/ constant width, overlap.• (G′(W )) ! ∆ for each W ∈ W .

Then there is efficient "Y s.t.

Error ≤ #O(p⌈∆/2⌉n).

Tree algorithm: proof sketch

Statisticallearningreduction:Take !Y to be the empirical risk minimizer:

!Y = arg minY ′∈F(X)

"

v∈V

#Y ′v = Zv

$.

RateforERM:"

v∈V

%!Yv = Yv

&≤ O

'log|F(X)|/ϵ2

(w.h.p. over Z.

We have |F(X)| ≈ O))

1p

*pn*→ E(!Y ) ≤ O(pn).

First result

Theorem: OptimalrecoveryfortreesWhen G isatree:

• Efficient algorithm !Y with Hamming errorError ≤ "O(pn) w.h.p.

• Lower bound of Ω(pn).• → Error = "O(pn) for all connected G!

Keyideas:• Edge MLE not enough! Squeeze more info out of Zs.• Adopt statistical learning viewpoint.

Side Information

Contributions• Characterize optimal recovery rates for trees.• Lift result to general graphs via tree decomposition.• Non-trivial recovery rates for all connected graphs,

including sparse graphs where recovery without sideinformation is impossible.

• All rates finite sample and high probability.• All achieved efficiently.

Key challenge

Huge body of work on solving / approximating MAP,MLE, etc., but how to establish tight bounds on

statistical performance?

Goal: Error = O(h(p)n), with h(p) → 0 as p → 0.

Key challenge

Huge body of work on solving / approximating MAP,MLE, etc., but how to establish tight bounds on

statistical performance?

Goal: Error = O(h(p)n), with h(p) → 0 as p → 0.ModelIntroduced in [Globerson-Roughgarden-Sontag-Yildirim‘15].

• Fixed graph G = (V,E), (|V | = n, |E| = m).• Ground truth labels Y ∈ ±1V .• Observe noisyedgelabels X ∈ ±1E :

Xuv =

!YuYv, with prob. (1− p)−YuYv, with prob. p

• Observe noisyvertexlabels Z ∈ ±1V :

Zu =

!Yu, with prob. (1− q)−Yu, with prob. q

Inference in Sparse Graphs with Pairwise Measurements and Side Information

Dylan Foster, Daniel Reichman, and Karthik Sridharan

Theory-Practice Gap

djfoster@cs.cornell.edu, daniel.reichman@gmail.com, sridharan@cs.cornell.edu

Model

Goal: Obtain small Hammingerror:

E(!Y ) !"

v∈V

#!Yv(X,Z) = Yv

$.

aka partial recovery.

Interestedin:• Finite sample behavior of E(!Y ) as function of p:

• Want α(G) such that E(!Y ) = O(pα(G)n).• Treat q = 1/2− ϵ as constant — Zs are very noisy.• Interested in p = ω(1/n) regime.

• α(G) for deterministic classes of graphs.• Sparse regime: (G) = O(1).

Model

Graph inferenceBasicproblem: Recover latent node variables using noisymeasurements on edges of a graph G = (V,E).

• Community detection

• Inference for structured prediction (e.g. image segmentation)

• Alignment/registration/synchronization,correlation clustering, genome assembly,... many more!

Censored block model:[Abbe et al.`14]

[Saade et al. `15]

[Globerson,Yildirim,Roughgarden,Sontag`15]

[Chen et al. `15]

[Joachims/Hopcroft`05]

Motivation

ContributionsOur question: How do recovery prospects change with addition of side information?

Proof sketch:Tree algorithm: proof sketch

HowtotakeadvantageofChernoff?!

uv∈EXuv = YuYv ≤ 2pn+O(1) w.h.p.

Define hypothesis class:

F(X) !"Y ′ ∈ ±1V |

!

uv∈E

#Xuv = Y ′

uY′v

$≤ 2pn+O(1)

%.

Thenwehave Y ∈ F(X) w.h.p., and |F(X)| ≈ O&&

1p

'pn'

Strategy: Learn Y from F(X) using Z!

F

O(pn)

X

Y

Result: Trees

Result: General Graphs

Further examples: Hypergrids, lattices, Newman-Watts — see paper for more.

Some Concrete Results

• If G connected, Error = O(p|V |).• For grids of size × n, optimal Error = Θ(p2n).

Until c ≤ 2, then O(pn) optimal.

• √n×

√n grid: We recover O(p2n), but like so: