# «Abstract. We consider the solution concept of stochastic stability, and propose the price of stochastic anarchy as an alternative to the price of ...»

The Price of Stochastic Anarchy

Christine Chung1, Katrina Ligett2, Kirk Pruhs1, and Aaron Roth2

1

Department of Computer Science

University of Pittsburgh

{chung,kirk}@cs.pitt.edu

2

Department of Computer Science

Carnegie Mellon University

{katrina,alroth}@cs.cmu.edu

Abstract. We consider the solution concept of stochastic stability, and propose

the price of stochastic anarchy as an alternative to the price of (Nash) anarchy

for quantifying the cost of selﬁshness and lack of coordination in games. As a solution concept, the Nash equilibrium has disadvantages that the set of stochas- tically stable states of a game avoid: unlike Nash equilibria, stochastically stable states are the result of natural dynamics of computationally bounded and decen- tralized agents, and are resilient to small perturbations from ideal play. The price of stochastic anarchy can be viewed as a smoothed analysis of the price of an- archy, distinguishing equilibria that are resilient to noise from those that are not.

To illustrate the utility of stochastic stability, we study the load balancing game on unrelated machines. This game has an unboundedly large price of Nash anar- chy even when restricted to two players and two machines. We show that in the two player case, the price of stochastic anarchy is 2, and that even in the general case, the price of stochastic anarchy is bounded. We conjecture that the price of stochastic anarchy is O(m), matching the price of strong Nash anarchy without requiring player coordination. We expect that stochastic stability will be useful in understanding the relative stability of Nash equilibria in other games where the worst equilibria seem to be inherently brittle.

Partially supported by an AT&T Labs Graduate Fellowship and an NSF Graduate Research Fellowship.

Supported in part by NSF grants CNS-0325353, CCF-0448196, CCF-0514058 and IIS- 0534531.

1 Introduction Quantifying the price of (Nash) anarchy is one of the major lines of research in algorith- mic game theory. Indeed, one fourth of the authoritative algorithmic game theory text edited by Nisan et al. [20] is wholly dedicated to this topic. But the Nash equilibrium solution concept has been widely criticized [15, 4, 9, 10]. First, it is a solution charac- terization without a road map for how players might arrive at such a solution. Second, at Nash equilibria, players are unrealistically assumed to be perfectly rational, fully informed, and infallible. Third, computing Nash equilibria is PPAD-hard for even 2- player, n-action games [6], and it is therefore considered very unlikely that there exists a polynomial time algorithm to compute a Nash equilibrium even in a centralized manner. Thus, it is unrealistic to assume that selﬁsh agents in general games will converge precisely to the Nash equilibria of the game, or that they will necessarily converge to anything at all. In addition, the price of Nash anarchy metric comes with its own weaknesses; it blindly uses the worst case over all Nash equilibria, despite the fact that some equilibria are more resilient than others to perturbations in play.

Considering these drawbacks, computer scientists have paid relatively little attention to if or how Nash equilibria will in fact be reached, and even less to the question of which Nash equilibria are more likely to be played in the event players do converge to Nash equilibria. To address these issues, we employ the stochastic stability framework from evolutionary game theory to study simple dynamics of computationally efﬁcient, imperfect agents. Rather than deﬁning a-priori states such as Nash equilibria, which might not be reachable by natural dynamics, the stochastic stability framework allows us to deﬁne a natural dynamic, and from it derive the stable states. We deﬁne the price of stochastic anarchy to be the ratio of the worst stochastically stable solution to the optimal solution. The stochastically stable states of a game may, but do not necessarily, contain all Nash equilibria of the game, and so the price of stochastic anarchy may be strictly better than the price of Nash anarchy. In games for which the stochastically stable states are a subset of the Nash equilibria, studying the ratio of the worst stochastically stable state to the optimal state can be viewed as a smoothed analysis of the price of anarchy, distinguishing Nash equilibria that are brittle to small perturbations in perfect play from those that are resilient to noise.

The evolutionary game theory literature on stochastic stability studies n-player games that are played repeatedly. In each round, each player observes her action and its outcome, and then uses simple rules to select her action for the next round based only on her size-restricted memory of the past rounds. In any round, players have a small probability of deviating from their prescribed decision rules. The state of the game is the contents of the memories of all the players. The stochastically stable states in such a game are the states with non-zero probability in the limit of this random process, as the probability of error approaches zero. The play dynamics we employ in this paper are the imitation dynamics studied by Josephson and Matros [16]. Under these dynamics, each player imitates the strategy that was most successful for her in recent memory.

To illustrate the utility of stochastic stability, we study the price of stochastic anarchy of the unrelated load balancing game [2, 1, 11]. To our knowledge, we are the ﬁrst to quantify the loss of efﬁciency in any system when the players are in stochastically stable equilibria. In the load balancing game on unrelated machines, even with only two players and two machines, there are Nash equilibria with arbitrarily high cost, and so the price of Nash anarchy is unbounded. We show that these equilibria are inherently 1 brittle, and that for two players and two machines, the price of stochastic anarchy is

2. This result matches the strong price of anarchy [1] without requiring coordination (at strong Nash equilibria, players have the ability to coordinate by forming coalitions).

We further show that in the general n-player, m-machine game, the price of stochastic anarchy is bounded. More precisely the price of stochastic anarchy is upper bounded by the nmth n-step Fibonacci number. We also show that the price of stochastic anarchy is at least m + 1.

Our work provides new insight into the equilibria of the load balancing game. Unlike some previous work on dynamics for games, our work does not seek to propose practical dynamics with fast convergence; rather, we use simple dynamics as a tool for understanding the inherent relative stability of equilibria. Instead of relying on player coordination to avoid the Nash equilibria with unbounded cost (as is done in the study of strong equilibria), we show that these bad equilibria are inherently unstable in the face of occasional uncoordinated mistakes. We conjecture that the price of stochastic anarchy is closer to the linear lower bound, paralleling the price of strong anarchy.

In light of our results, we believe the techniques in this paper will be useful for understanding the relative stability of Nash equilibria in other games for which the worst equilibria are brittle. Indeed, for a variety of games in the price of anarchy literature, the worst Nash equilibria of the lower bound instances are not stochastically stable.

**1.1 Related Work**

We give a brief survey of related work in three areas: alternatives to Nash equilibria as a solution concept, stochastic stability, and the unrelated load balancing game.

Recently, several papers have noted that the Nash equilibrium is not always a suitable solution concept for computationally bounded agents playing in a repeated game, and have proposed alternatives. Goemans et al. [15] study players who sequentially play myopic best responses, and quantify the price of sinking that results from such play. Fabrikant and Papadimitriou [9] propose a model in which agents play restricted ﬁnite automata. Blum et al. [4, 3] assume only that players’ action histories satisfy a property called no regret, and show that for many games, the resulting social costs are no worse than those guaranteed by price of anarchy results.

Although we believe this to be the ﬁrst work studying stochastic stability in the computer science literature, computer scientists have recently employed other tools from evolutionary game theory. Fisher and V¨ cking [13] show that under replicator o dynamics in the routing game studied by Roughgarden and Tardos [22], players converge to Nash. Fisher et al. [12] went on to show that using a simultaneous adaptive sampling method, play converges quickly to a Nash equilibrium. For a thorough survey of algorithmic results that have employed or studied other evolutionary game theory techniques and concepts, see Suri [23].

Stochastic stability and its adaptive learning model as studied in this paper were ﬁrst deﬁned by Foster and Young [14], and differ from the standard game theory solution concept of evolutionarily stable strategies (ESS). ESS are a reﬁnement of Nash equilibria, and so do not always exist, and are not necessarily associated with a natural play dynamic. In contrast, a game always has stochastically stable states that result (by construction) from natural dynamics. In addition, ESS are resilient only to single shocks, whereas stochastically stable states are resilient to persistent noise.

2 Stochastic stability has been widely studied in the economics literature (see, for example, [24, 17, 19, 5, 7, 21, 16]). We discuss in Sect. 2 concepts from this body of literature that are relevant to our results. We recommend Young [25] for an informative and readable introduction to stochastic stability, its adaptive learning model, and some related results. Our work differs from prior work in stochastic stability in that it is the ﬁrst to quantify the social utility of stochastically stable states, the price of stochastic anarchy.

We also note a connection between the stochastically stable states of the game and the sinks of a game, recently introduced by Goemans et al. as another way of studying the dynamics of computationally bounded agents. In particular, the stochastically stable states of a game under the play dynamics we consider correspond to a subset of the sink equilibria, and so provide a framework for identifying the stable sink equilibria.

In potential games, the stochastically stable states of the play dynamics we consider correspond to a subset of the Nash equilibria, thus providing a method for identifying which of these equilibria are stable.

In this paper, we study the price of stochastic anarchy in load balancing. Even-Dar et al. [1] show that when playing the load balancing game on unrelated machines, any turn-taking improvement dynamics converge to Nash. Andelman et al. [1] observe that the price of Nash anarchy in this game is unbounded and they show that the strong price of anarchy is linear in the number of machines. Fiat et al. [11] tighten their upper bound to match their lower bound at a strong price of anarchy of exactly m.

2 Model and Background We now formalize (from Young [24]) the adaptive play model and the deﬁnition of stochastic stability. We then formalize the play dynamics that we consider. We also provide in this section the results from the stochastic stability literature that we will later use for our results.

putationally efﬁcient agents, and so we imagine that each agent has some ﬁnite memory of size z, and that after time step t, all players remember a history consisting of a sequence of play proﬁles ht = (S t−z+1, S t−z+2,..., S t ) ∈ (X)z.

We assume that each player i has some efﬁciently computable function pi : (X)z × Xi → R that, given a particular history, induces a sampleable probability distribution over actions (for all players i and histories h, a∈Xi pi (h, a) = 1). We write p for i pi. We wish to model imperfect agents who make mistakes, and so we imagine that at time t each player i plays according to pi with probability 1 −, and with probability plays some action in Xi uniformly at random.3 That is, for all players i, for all actions 3 The mistake probabilities need not be uniform random—all that we require is that the distribution has support on all actions in Xi.

We will refer to P 0 as the unperturbed Markov process. Note that for 0, ph,h 0 for every history h and successor h, and that for any two histories h and ˆ h not necessarily a successor of h, there is a series of z histories h1,..., hz such that ˆ h1 = h, hz = h, and for all 1 i ≤ z, hi is a successor of hi−1. Thus there is positive ˆ probability of moving between any h and any h in z steps, and so P is irreducible.

ˆ Similarly, there is a positive probability of moving between any h and any h in z + 1 steps, and so P is aperiodic. Therefore, P has a unique stationary distribution µ.

The stochastically stable states of a particular game and player dynamics are the states with nonzero probability in the limit of the stationary distribution.

Deﬁnition 2.2 (Foster and Young [14]). A state h is stochastically stable relative to P if lim →0 µ (h) 0.

Intuitively, we should expect a process P to spend almost all of its time at its stochastically stable states when is small.

When a player i plays at random rather than according to pi, we call this a mistake.

Deﬁnition 2.3 (Young [24]). Suppose h = (S t−z+1,..., S t ) is a successor of h. A t t mistake in the transition between h and h is any element Si such that pi (h, Si ) = 0.

Note that mistakes occur with probability ≤.

We can characterize the number of mistakes required to get from one history to another.

Deﬁnition 2.4 (Young [24]). For any two states h, h, the resistance r(h, h ) is the minimum total number of mistakes involved in the transition h → h if h is a successor of h. If h is not a successor of h, then r(h, h ) = ∞.

Note that the transitions of zero resistance are exactly those that occur with positive probability in the unperturbed Markov process P 0.

Deﬁnition 2.5. We refer to the sinks of P 0 as recurrent classes. In other words, a recurrent class of P 0 is a set of states C ⊆ H such that any state in C is reachable from any other state in C and no state outside C is accessible from any state inside C.