Energy and Mean-Payoff Parity Markov Decision Processes

54
Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011

description

Energy and Mean-Payoff Parity Markov Decision Processes. Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011. Games for system analysis. output. Spec: φ ( input,output ). System. Environment. input. Verification: check if a given system is correct - PowerPoint PPT Presentation

Transcript of Energy and Mean-Payoff Parity Markov Decision Processes

Page 1: Energy and Mean-Payoff Parity Markov Decision Processes

Energy and Mean-Payoff Parity Markov Decision

ProcessesLaurent Doyen

LSV, ENS Cachan & CNRS

Krishnendu ChatterjeeIST Austria

MFCS 2011

Page 2: Energy and Mean-Payoff Parity Markov Decision Processes

Games for system analysis

• Verification: check if a given system is correct

reduces to graph searching

Systeminput

output Spec:

φ(input,output)Environmen

t

Page 3: Energy and Mean-Payoff Parity Markov Decision Processes

Games for system analysis

?input

output Spec:

φ(input,output)Environmen

t

• Verification: check if a given system is correct

reduces to graph searching

• Synthesis : construct a correct system

reduces to game solving – finding a winning strategy

Page 4: Energy and Mean-Payoff Parity Markov Decision Processes

Games for system analysis

Spec: φ(input,output)

• Verification: check if a given system is correct

reduces to graph searching

• Synthesis : construct a correct system

reduces to game solving – finding a winning strategy

This talk: environment is abstracted as a stochastic process

?input

output Environmen

t

input

output = Markov

decision process (MDP)

?

Page 5: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process

Page 6: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 7: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 8: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 9: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 10: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 11: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 12: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Page 13: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Strategy (policy) = recipe to extend the play prefix

Page 14: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Strategy (policy) = recipe to extend the play prefix

Page 15: Energy and Mean-Payoff Parity Markov Decision Processes

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic (player 2)

Strategy (policy) = recipe to extend the play prefix wea

k

Page 16: Energy and Mean-Payoff Parity Markov Decision Processes

Objective

Strategy is almost-sure winning, if with probability 1:

- Büchi: visit accepting states infinitely often.

- Parity: least priority visited infinitely often is even.

Fix a strategy (infinite) Markov chain

Page 17: Energy and Mean-Payoff Parity Markov Decision Processes

Decision problem

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Page 18: Energy and Mean-Payoff Parity Markov Decision Processes

Decision problem

End-component = set of states U s.t.

• if then some successor of q is in U

• if then all successors of q are in U

• strongly connected stronglyconnecte

d

U

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Page 19: Energy and Mean-Payoff Parity Markov Decision Processes

Decision problem

End-component is good if least priority is even

Almost-sure reachability to good end-components in PTIME

stronglyconnecte

d

U

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

End-component = set of states U s.t.

• if then some successor of q is in U

• if then all successors of q are in U

• strongly connected

Page 20: Energy and Mean-Payoff Parity Markov Decision Processes

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Decision problem

End-component is good if least priority is even

Almost-sure reachability to good end-components in PTIME

stronglyconnecte

d

End-component = set of states U s.t.

• if then some successor of q is in U

• if then all successors of q are in U

• strongly connected

Page 21: Energy and Mean-Payoff Parity Markov Decision Processes

Parity condition

Qualitative

ω-regular specifications(reactivity, liveness,…)

Objectives

Page 22: Energy and Mean-Payoff Parity Markov Decision Processes

Parity condition

Energy condition

Qualitative Quantitative

ω-regular specifications(reactivity, liveness,…)

Resource-constrainedspecifications

Objectives

Page 23: Energy and Mean-Payoff Parity Markov Decision Processes

Energy objective

Positive and negative weights (encoded in binary)

Page 24: Energy and Mean-Payoff Parity Markov Decision Processes

Energy objective

Energy level: 20, 21, 11, 1, 2,…(sum of weights)

Positive and negative weights (encoded in binary)

20

21 1

1

11 1 2

Page 25: Energy and Mean-Payoff Parity Markov Decision Processes

Energy objective

Positive and negative weights

Energy level: 20, 21, 11, 1, 2,…(sum of weights)Initial credit

20

20

21 1

1

11 1 2

Page 26: Energy and Mean-Payoff Parity Markov Decision Processes

Energy objective

Positive and negative weights

Energy level: 20, 21, 11, 1, 2,…(sum of weights)Initial credit

20

A play is winning if the energy level is always nonnegative.

“Never exhaust the resource (memory, battery, …)”

Page 27: Energy and Mean-Payoff Parity Markov Decision Processes

Decision problem

Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Page 28: Energy and Mean-Payoff Parity Markov Decision Processes

Decision problem

Equivalent to a two-player game:

If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

player 2 state

Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Page 29: Energy and Mean-Payoff Parity Markov Decision Processes

Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Decision problem

Equivalent to a two-player game:

If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

player 2 state

Page 30: Energy and Mean-Payoff Parity Markov Decision Processes

Energy Parity MDP

Page 31: Energy and Mean-Payoff Parity Markov Decision Processes

Objectives

Energy parity MDP

Qualitative Quantitative

Mixed qualitative-quantitative

ω-regular specifications(reactivity, liveness,…)

Resource-constrainedspecifications

Parity condition

Energy condition

Page 32: Energy and Mean-Payoff Parity Markov Decision Processes

Strategy is almost-sure winning with initial credit c0, if with probability 1:

energy condition and parity condition hold

Energy parity MDP

“never exhaust the resource”and

“always eventually do something useful”

Page 33: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for Energy Büchi MDP

For parity, probabilistic player is my friend

For energy, probabilistic player is my opponent

Page 34: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for Energy Büchi MDP

Replace each probabilistic state by the gadget:

For parity, probabilistic player is my friend

For energy, probabilistic player is my opponent

Page 35: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for Energy Büchi MDP

Reduction of energy Büchi MDP to energy Büchi game

Page 36: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for Energy Büchi MDP

Reduction of energy Büchi MDP to energy Büchi game

Player 1 can guess an even priority 2i,and win in the energy Büchi MDP where:

• Büchi states are 2i-states, and

• transitions to states with priority <2i are disallowed

Reduction of energy parity MDP to energy Büchi MDP

Page 37: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff Parity MDP

Page 38: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff

Mean-payoff value of a play = limit-average of the visited weights

Optimal mean-payoff value can be achieved with a memoryless strategy.

Decision problem:

Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Page 39: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff

Mean-payoff value of a play = limit-average of the visited weights

Optimal mean-payoff value can be achieved with a memoryless strategy.

Decision problem:

Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Page 40: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff games

Memoryless strategy σ ensures nonnegative mean-payoff value

iff

all cycles are nonnegative in Gσ

iff

memoryless strategy σ is winning in energy game

Mean-payoff games with threshold 0 are equivalent to energy games.

Page 41: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff vs. Energy

Energy Mean-Payoff

Games NP coNP NP coNP

MDP NP coNP PTIME

Page 42: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff parity MDPs

Find a strategy which ensures with probability 1:

- parity condition, and

- mean-payoff value ≥ • Gadget reduction does not work:

Player 1 almost-surely wins

Player 1 loses

Page 43: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for mean-payoff parity

• End-component analysis

• almost-surely all states of end-component can be visited infinitely often

• expected mean-payoff value of all states in end-component is same

stronglyconnecte

d

End-component is good if - least priority is even- expected mean-payoff value ≥

Almost-sure reachability to good end-component in PTIME

Page 44: Energy and Mean-Payoff Parity Markov Decision Processes

Algorithm for mean-payoff parity

• End-component analysis

• almost-surely all states of end-component can be visited infinitely often

• expected mean-payoff value of all states in end-component is same

stronglyconnecte

d

End-component is good if - least priority is even- expected mean-payoff value ≥

Almost-sure reachability to even end-component in PTIME

Page 45: Energy and Mean-Payoff Parity Markov Decision Processes

MDP

Parity MDPs

Energy MDPs

Mean-payoff MDPs

Energy parity MDPs

Mean-payoff parity MDPs

Qualitative Quantitative

Mixed qualitative-quantitative

Page 46: Energy and Mean-Payoff Parity Markov Decision Processes

MDP

Parity MDPs

Energy MDPs

Mean-payoff MDPs

Energy parity MDPs

Mean-payoff parity MDPs

Qualitative Quantitative

Mixed qualitative-quantitative

Page 47: Energy and Mean-Payoff Parity Markov Decision Processes

Summary

Energy parity Mean-Payoff parity

Games NP coNP NP coNP

MDP NP coNP PTIME

Algorithmiccomplexity

Page 48: Energy and Mean-Payoff Parity Markov Decision Processes

Summary

Energy parity Mean-Payoff parity

Games NP coNP NP coNP

MDP NP coNP PTIME

Strategy complexity

Energy parity Mean-Payoff parity

Games n·d·W infinite

MDP 2·n·W infinite

Algorithmiccomplexity

Page 49: Energy and Mean-Payoff Parity Markov Decision Processes

Thank you !

Questions ?

The end

Page 50: Energy and Mean-Payoff Parity Markov Decision Processes

The end

Page 51: Energy and Mean-Payoff Parity Markov Decision Processes

References

[CdAHS03]

A.Chakrabarti, L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Resource interfaces, Proc. of EMSOFT: Embedded Software, LNCS 2855, Springer, pp.117-133, 2003

[EM79] A. Ehrenfeucht, and J. Mycielski, Positional Strategies for Mean-Payoff Games, International Journal of Game Theory, vol. 8, pp. 109-113, 1979

[BFL+08] P. Bouyer, U. Fahrenberg, K.G. Larsen, N. Markey, and J. Srba, Infinite Runs in Weighted Timed Automata with Energy Constraints, Proc. of FORMATS: Formal Modeling and Analysis of Timed Systems, LNCS 5215, Springer, pp. 33-47, 2008

[CHJ05] K. Chatterjee, T.A. Henzinger, M. Jurdzinski. Mean-payoff Parity Games, Proc. of LICS: Logic in Computer Science, IEEE, pp. 178-187, 2005.

Page 52: Energy and Mean-Payoff Parity Markov Decision Processes

Energy parity

Page 53: Energy and Mean-Payoff Parity Markov Decision Processes

Mean-payoff parity

Page 54: Energy and Mean-Payoff Parity Markov Decision Processes

ComplexityStrategy Algorithmic

Player 1 Player 2 complexity

Energymemoryle

ssmemoryles

sNP coNP

Paritymemoryle

ssmemoryles

sNP coNP

Energy parity

exponential

memoryless

NP coNP

Mean-payoffmemoryle

ssmemoryles

sNP coNP

Mean-payoff parity

infinitememoryles

sNP coNP