Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS...

Energy and Mean-Payoff Parity Markov Decision

ProcessesLaurent Doyen

LSV, ENS Cachan & CNRS

Krishnendu ChatterjeeIST Austria

MFCS 2011

Games for system analysis

• Verification: check if a given system is correct

reduces to graph searching

Systeminput

output Spec:

φ(input,output)Environmen

?input

output Spec:

φ(input,output)Environmen

• Synthesis : construct a correct system

reduces to game solving – finding a winning strategy

Spec: φ(input,output)

• Synthesis : construct a correct system

reduces to game solving – finding a winning strategy

This talk: environment is abstracted as a stochastic process

?input

output Environmen

output = Markov

decision process (MDP)

Markov decision process

Markov decision process (MDP)

Nondeterministic (player 1)Probabilistic

Strategy (policy) = recipe to extend the play prefix

Nondeterministic (player 1)Probabilistic (player 2)

Strategy (policy) = recipe to extend the play prefix wea

Objective

Strategy is almost-sure winning, if with probability 1:

- Büchi: visit accepting states infinitely often.

- Parity: least priority visited infinitely often is even.

Fix a strategy (infinite) Markov chain

Decision problem

Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

Decision problem

End-component = set of states U s.t.

• if then some successor of q is in U

• if then all successors of q are in U

• strongly connected stronglyconnecte

Decision problem

End-component is good if least priority is even

Almost-sure reachability to good end-components in PTIME

stronglyconnecte

• strongly connected

Decision problem

End-component is good if least priority is even

Almost-sure reachability to good end-components in PTIME

stronglyconnecte

• strongly connected

Parity condition

Qualitative

ω-regular specifications(reactivity, liveness,…)

Objectives

Parity condition

Energy condition

Qualitative Quantitative

Resource-constrainedspecifications

Objectives

Energy objective

Positive and negative weights (encoded in binary)

Energy objective

Energy level: 20, 21, 11, 1, 2,…(sum of weights)

Positive and negative weights (encoded in binary)

11 1 2

Energy objective

Positive and negative weights

Energy level: 20, 21, 11, 1, 2,…(sum of weights)Initial credit

11 1 2

Energy objective

Positive and negative weights

Energy level: 20, 21, 11, 1, 2,…(sum of weights)Initial credit

A play is winning if the energy level is always nonnegative.

“Never exhaust the resource (memory, battery, …)”

Decision problem

Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Decision problem

Equivalent to a two-player game:

If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

player 2 state

Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

Decision problem

Equivalent to a two-player game:

If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

player 2 state

Energy Parity MDP

Objectives

Energy parity MDP

Mixed qualitative-quantitative

Resource-constrainedspecifications

Parity condition

Energy condition

Strategy is almost-sure winning with initial credit c0, if with probability 1:

energy condition and parity condition hold

Energy parity MDP

“never exhaust the resource”and

“always eventually do something useful”

Algorithm for Energy Büchi MDP

For parity, probabilistic player is my friend

For energy, probabilistic player is my opponent

Replace each probabilistic state by the gadget:

For parity, probabilistic player is my friend

For energy, probabilistic player is my opponent

Reduction of energy Büchi MDP to energy Büchi game

Player 1 can guess an even priority 2i,and win in the energy Büchi MDP where:

• Büchi states are 2i-states, and

• transitions to states with priority <2i are disallowed

Reduction of energy parity MDP to energy Büchi MDP

Mean-payoff Parity MDP

Mean-payoff

Mean-payoff value of a play = limit-average of the visited weights

Optimal mean-payoff value can be achieved with a memoryless strategy.

Decision problem:

Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Mean-payoff

Mean-payoff value of a play = limit-average of the visited weights

Optimal mean-payoff value can be achieved with a memoryless strategy.

Decision problem:

Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

Mean-payoff games

Memoryless strategy σ ensures nonnegative mean-payoff value

all cycles are nonnegative in Gσ

memoryless strategy σ is winning in energy game

Mean-payoff games with threshold 0 are equivalent to energy games.

Mean-payoff vs. Energy

Energy Mean-Payoff

Games NP coNP NP coNP

MDP NP coNP PTIME

Mean-payoff parity MDPs

Find a strategy which ensures with probability 1:

- parity condition, and

- mean-payoff value ≥ • Gadget reduction does not work:

Player 1 almost-surely wins

Player 1 loses

Algorithm for mean-payoff parity

• End-component analysis

• almost-surely all states of end-component can be visited infinitely often

• expected mean-payoff value of all states in end-component is same

stronglyconnecte

End-component is good if - least priority is even- expected mean-payoff value ≥

Almost-sure reachability to good end-component in PTIME

Algorithm for mean-payoff parity

• End-component analysis

• almost-surely all states of end-component can be visited infinitely often

• expected mean-payoff value of all states in end-component is same

stronglyconnecte

End-component is good if - least priority is even- expected mean-payoff value ≥

Almost-sure reachability to even end-component in PTIME

Parity MDPs

Energy MDPs

Mean-payoff MDPs

Energy parity MDPs

Parity MDPs

Energy MDPs

Mean-payoff MDPs

Energy parity MDPs

Summary

Energy parity Mean-Payoff parity

MDP NP coNP PTIME

Algorithmiccomplexity

Summary

MDP NP coNP PTIME

Strategy complexity

Games n·d·W infinite

MDP 2·n·W infinite

Algorithmiccomplexity

Thank you !

Questions ?

The end

References

[CdAHS03]

A.Chakrabarti, L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Resource interfaces, Proc. of EMSOFT: Embedded Software, LNCS 2855, Springer, pp.117-133, 2003

[EM79] A. Ehrenfeucht, and J. Mycielski, Positional Strategies for Mean-Payoff Games, International Journal of Game Theory, vol. 8, pp. 109-113, 1979

[BFL+08] P. Bouyer, U. Fahrenberg, K.G. Larsen, N. Markey, and J. Srba, Infinite Runs in Weighted Timed Automata with Energy Constraints, Proc. of FORMATS: Formal Modeling and Analysis of Timed Systems, LNCS 5215, Springer, pp. 33-47, 2008

[CHJ05] K. Chatterjee, T.A. Henzinger, M. Jurdzinski. Mean-payoff Parity Games, Proc. of LICS: Logic in Computer Science, IEEE, pp. 178-187, 2005.

Energy parity

Mean-payoff parity

ComplexityStrategy Algorithmic

Player 1 Player 2 complexity

Energymemoryle

ssmemoryles

sNP coNP

Paritymemoryle

ssmemoryles

sNP coNP

Energy parity

exponential

memoryless

NP coNP

Mean-payoffmemoryle

ssmemoryles

sNP coNP

Mean-payoff parity

infinitememoryles

sNP coNP

Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS...

Documents

Transcript of Energy and Mean-Payoff Parity Markov Decision Processes Laurent Doyen LSV, ENS Cachan & CNRS...

SSD DEMA hdt 08Aug06 - Pennsylvania State University...nn H nn × × = − ⎡ ⎣ ⎢ ⎤ ... Probability (Chatterjee, Bhavana and Lin, 2006) 24 Supersaturated Designs with High Searching

A Survey of Stochastic ω-Regular Games · A Survey of Stochastic ω-Regular Games Krishnendu Chatterjee† Thomas A. Henzinger†‡ †EECS, University of California, Berkeley,

Normal forms and geometric numerical integration of ...Normal forms and geometric numerical integration of Hamiltonian PDEs Part I: Linear equations Erwan Faou INRIA & ENS Cachan Bretagne

Recent Results on Cumulants of Net-Particle Distributions ... · See A. Chatterjee, Poster #534 The 2nd-order off-diagonal cumulants STAR Preliminary 1 1.4 1.8 10 20 60 200 (GeV)

Game Playing - cs.jhu.eduphi/ai/slides/lecture-game-playing.pdfGame Playing Philipp Koehn ... = best achievable payoff against best play ... – even expert Chess players use standard

Quantitative Languages Krishnendu Chatterjee, UCSC Laurent Doyen, EPFL Tom Henzinger, EPFL CSL 2008.

ANNUAL REPORT 2017 - sssihl.edu.insssihl.edu.in/sssuniversity/Portals/0/Images/Resources and Help... · ο Dr. Sandip Chatterjee, ... Rajeshwari C Patel, Dean, Faculty of Economics

14.126 Spring 2016 Repeated Games Lecture Slides...Optimal penal code for i: SPE giving i lowest possible payoff. I. Showed existence of optimal penal codes for pure-strategy SPEs.

LOCALIZATION IN RANDOM GEOMETRIC GRAPHS - arXiv · 2019-07-04 · 2 SOURAV CHATTERJEE AND MATAN HAREL monograph have been improved and generalized by Penrose and others in the years

Energy and Mean-Payoff Parity Markov Decision Processes

Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.

Adaptive optics Using Ferro-fluids Mathieu De Goer-de Herve and Raphael-David Lasseri (ENS Cachan)