DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash...

30
DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014

Transcript of DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash...

DAVA: Distributing Vaccines over Networks under Prior InformationDAVA: Distributing Vaccines over Networks under Prior Information

Yao Zhang, B. Aditya Prakash

Department of Computer Science

Virginia Tech

SDM, Philadelphia, April 24, 2014

2

Motivation: EpidemiologyMotivation: Epidemiology• Virus spreads over contact

networks• SIR model [Anderson+ 1991]

• Susceptible-Infectious-Recovered• Weights pij: propagation prob.

from i to j• Recovered prob. δ for each node• (models mumps-like infections)

Zhang and Prakash, SDM2014

3

Motivation: Social MediaMotivation: Social Media• Meme/Rumor spreads over

friendship networks• E.g.: Twitter following network

• Independent cascade model (IC) [Kempe+ KDD2003]

• Each node has only one chance to infect its neighbors

• Special case of SIR model

Zhang and Prakash, SDM2014

4

ImmunizationImmunization

• Centers for Disease Control (CDC) cares about containing epidemic diseases• E.g: ~400 million dollars used for vaccines for

children in 2013

• Twitter tries to stop rumor spread• E.g.: rumors of victims after the Boston Marathon

bombs in 2013

Zhang and Prakash, SDM2014

How to choose best nodes to vaccinate (remove)?

5

ImmunizationImmunization

Zhang and Prakash, SDM2014

Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• pick a random person, immunize one of its

neighbors at random • Netshield [Tong+ 2010]• Minimize the epidemic threshold

(point when the virus takes-off)

Good for baseline strategies

6

In realityIn reality

Typically the epidemic has already started!• More realistic intervention• Which nodes to vaccinate now?• We call it Data-Aware Immunization

this paperZhang and Prakash, SDM2014

Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• Netshield [Tong+ 2010] ?

7

OutlineOutline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

8

Data-Aware Vaccination ProblemData-Aware Vaccination ProblemProblem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic?

1 vaccine?

pij =1 for all edges

Best solutionA

B

C C

B

A

Remove A, save {A, D}; Remove B, save {B};Remove C, save {C};

Zhang and Prakash, SDM2014

F

E

D

E

F

D

9

OutlineOutline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

10

Complexity of DAVComplexity of DAV

• NP-hard• Reduce from Maximum K-Intersection Problem

(MaxKI: maximizing the intersection of k subsets)• MaxKI is NP-Complete [Vinterbo 2004]

• Approximation algorithm?• Not submodular

• Actually, DAV is hard to approximate within an absolute error!

See paper for details

Zhang and Prakash, SDM2014

11

OutlineOutline• Motivation• Problem Definition• Complexity• Our Proposed Methods• assume IC model and undirected graph

• Experiments• Conclusion

Zhang and Prakash, SDM2014

12

1: Simplify - Merging infected nodes 1: Simplify - Merging infected nodes

• Idea: merge all the infected nodes into a single ‘super infected’ node I

pX

pY

pB

Logical-ORpB=1-(1-pX)(1-pY)

pA

pC

pA

pC

Equivalent

Merged GraphOriginal Graph

A

B

C

A

B

C

Zhang and Prakash, SDM2014

Super node I

13

2: DAVA-Tree Algorithm: Idea 2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit”• : the expected number of saved nodes after

removing set S on graph G• Benefit of adding additional node j into S:

Merged Infected Node

Benefit: 4

Benefit: 2

Benefit: 5

pij =1for all edges

Additional number of saved nodes when adding node j into S

# of saved nodes after adding j into S

Zhang and Prakash, SDM2014

14

DAVA-Tree Alg.: Optimal on TreesDAVA-Tree Alg.: Optimal on Trees

• Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I

Benefit: 4

Benefit: 2 Benefit: 5

• Fact 2: the benefit of each such node is independent of the rest of the set S

DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit

pij =1for all edges

Merged Infected Node

Linear Time

Zhang and Prakash, SDM2014

For any set S:

15

• Idea• We have the optimal algorithm for a tree• Extract a spanning tree, then run DAVA-tree• What kind of tree?

• Minimum spanning tree

3: General Case – Arbitrary Graphs3: General Case – Arbitrary Graphs

pij =1 for all edges

Optimal solution

MST

Optimal on MST by DAVA-tree

Zhang and Prakash, SDM2014

16

• Idea• We have the optimal algorithm for a tree• Build a spanning tree first• What kind of tree?

• Minimum spanning tree

3: General Case – Arbitrary Graphs3: General Case – Arbitrary Graphs

We propose to use dominator tree

u dominates v

every path from I to v contains u

4 dominates 8,9,10,11pij =1 for all edges

Software engineering

Zhang and Prakash, SDM2014

17

Dominator TreeDominator Tree

Merged Graph Dominator Tree

Linear time [Buchsbaum, Tarjan 1998]

Optimal from DAVA-tree

u dominates v AND every other dominator of v dominates u

u is immediate dominator of v

Dominator tree: add an edge between every such u and v

Optimal solution

pij =1 for all edges

• Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph

• Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution

Zhang and Prakash, SDM2014

18

Weighting the dominator treeWeighting the dominator tree• Weighting the dominator tree• #P-complete

• Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm)

Merged Graph Dominator Tree

Zhang and Prakash, SDM2014

p1

p6

p3

w1

w6

w3

19

DAVA algorithmDAVA algorithm

|S|=2Iteration=1

Merged Graph (pij =1 for all edges)

Dominator Tree

Step: 1. T = Build a dominator tree

2. v = Run DAVA-tree on T with

budget=1

3. Remove v from G

4. Goto Step 1 until |S|=k

Zhang and Prakash, SDM2014

20

DAVA algorithmDAVA algorithmStep: 1. T = Build a dominator tree

2. v = Run DAVA-tree on T with

budget=1

3. Remove v from G

4. Goto Step 1 until |S|=kO(k(|E|+ |V|log|V|))

Too slow for large networks!

Remove selected node

Dominator tree

|S|=2Iteration=2

Merged Graph

Iteration=1

Zhang and Prakash, SDM2014

21

DAVA-fast: a faster algorithmDAVA-fast: a faster algorithm

• Time complexity: subquadratic!– DAVA-fast: O(|V|log|V|+|E|)

Step: 1. T = Build a dominator tree

2. S = Run DAVA-tree on T

with budget=k

|S|=2 • In practice, the performance of

DAVA-fast is very close to DAVA

Dominator tree

Merged Graph

Zhang and Prakash, SDM2014

22

Extending to SIR modelExtending to SIR model• See the paper

Zhang and Prakash, SDM2014

23

OutlineOutline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

24

ExperimentsExperiments• Virus Propagation Model• IC and SIR

• Settings (See more settings in the paper)

• Randomly uniformly chosen initial infected nodes

• Baseline Algorithms• RANDOM: randomly uniformly chosen healthy nodes• DEGREE: choose nodes with top weighted degrees• PAGERANK: choose nodes with top pageranks• NETSHIELD

• state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010]

• Assumes no data is given before the epidemic starts

Zhang and Prakash, SDM2014

25

Experiments: datasetsExperiments: datasetsDatasets are chosen from different domains• Social media (IC model)

• OREGON: AS router graph• STANFORD: hyperlink network• GNUTELLA: peer-to-peer network• BRIGHTKITE: friendship network

• Epidemiology (SIR model)• PORTLAND and MIAMI: large urban social-contact graph used in

national smallpox modeling studies [Eubank+, 2004]

OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI

|V| 633 8,929 10,876 58,228 0.5 million 0.6 million

|E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million

Zhang and Prakash, SDM2014

26

Experiments: QualityExperiments: QualityGNUTELLA (IC model) PORTLAND (SIR model)

DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA.

(See more results in the paper)

Higher is better

Zhang and Prakash, SDM2014

27

Experiments: ScalabilityExperiments: Scalabilitydid not finish within 10 hours

Ru

nn

ing

tim

e(se

c.)

Lower is better

Zhang and Prakash, SDM2014

28

OutlineOutline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

29

ConclusionConclusion

Dominator tree

Merged graph

Graph with infected nodes

Data-Aware Vaccination problem

Given: Graph and Infected nodes

Find: ‘best’ nodes for immunization• Complexity

• NP-hard• Hard to approximate within an absolute error

• DAVA-tree• Optimal solution on the tree

• DAVA and DAVA-fast• Merging infected nodes• Build a dominator tree, and run DAVA-tree

• Running time: subquadratic• DAVA: O(k(|E|+ |V|log|V|))• DAVA-fast: O(|E|+|V|log|V|)

Zhang and Prakash, SDM2014

30

Any Questions?Any Questions?

Code at:http://people.cs.vt.edu/~yaozhang

Thanks for the support of NSF (Grant No. IIS-1353346).

Yao Zhang B. Aditya Prakash

Zhang and Prakash, SDM2014

Dominator tree

Merged graph

Graph with infected nodes