Walking Randomly, Massively, and Efficiently

57
arXiv:1907.05391v4 [cs.DS] 6 Nov 2019 Walking Randomly, Massively, and Efficiently Jakub Łącki Google Research Slobodan Mitrović MIT Krzysztof Onak IBM Research Piotr Sankowski University of Warsaw Abstract We introduce a set of techniques that allow for efficiently generating many independent ran- dom walks in the Massive Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the Θ(log n) MPC round complexity barrier. Our tech- niques enable breaking this barrier for PageRank—one of the most important applications of random walks—even in more challenging directed graphs, and for approximate bipartiteness and expansion testing. In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length l in Θ(log l) rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight. For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process. For PageRank, we achieve the following round complexities for damping factor equal to 1 ǫ: in O(log log n + log 1) rounds for undirected graphs (with ˜ O(m/ǫ 2 ) total space), in ˜ O(log 2 log n + log 2 1) rounds for directed graphs (with ˜ O((m + n 1+o(1) )/poly ǫ) total space). The round complexity of our result for computing PageRank has only logarithmic dependence on 1. We use this to show that our PageRank algorithm can be used to construct directed length-l random walks in O(log 2 log n + log 2 l) rounds (with ˜ O((m + n 1+o(1) )poly l) total space). Namely, by letting ǫ = Θ(1/l), a length-l PageRank walk with constant probability contains no random jump, and hence is a directed random walk.

Transcript of Walking Randomly, Massively, and Efficiently

Page 1: Walking Randomly, Massively, and Efficiently

arX

iv:1

907.

0539

1v4

[cs

.DS]

6 N

ov 2

019

Walking Randomly, Massively, and Efficiently

Jakub Łącki

Google Research

Slobodan Mitrović

MIT

Krzysztof Onak

IBM Research

Piotr Sankowski

University of Warsaw

Abstract

We introduce a set of techniques that allow for efficiently generating many independent ran-dom walks in the Massive Parallel Computation (MPC) model with space per machine stronglysublinear in the number of vertices. In this space-per-machine regime, many natural approachesto graph problems struggle to overcome the Θ(logn) MPC round complexity barrier. Our tech-niques enable breaking this barrier for PageRank—one of the most important applications ofrandom walks—even in more challenging directed graphs, and for approximate bipartitenessand expansion testing.

In the undirected case, we start our random walks from the stationary distribution, whichimplies that we approximately know the empirical distribution of their next steps. This allowsfor preparing continuations of random walks in advance and applying a doubling approach.As a result we can generate multiple random walks of length l in Θ(log l) rounds on MPC.Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexityis asymptotically tight.

For directed graphs, our approach stems from our treatment of the PageRank Markov chain.We first compute the PageRank for the undirected version of the input graph and then slowlytransition towards the directed case, considering convex combinations of the transition matricesin the process.

For PageRank, we achieve the following round complexities for damping factor equal to 1−ǫ:

• in O(log logn+ log 1/ǫ) rounds for undirected graphs (with O(m/ǫ2) total space),

• in O(log2 logn+ log2 1/ǫ) rounds for directed graphs (with O((m+ n1+o(1))/poly ǫ) totalspace).

The round complexity of our result for computing PageRank has only logarithmic dependenceon 1/ǫ. We use this to show that our PageRank algorithm can be used to construct directedlength-l random walks in O(log2 logn+log2 l) rounds (with O((m+n1+o(1))poly l) total space).Namely, by letting ǫ = Θ(1/l), a length-l PageRank walk with constant probability contains norandom jump, and hence is a directed random walk.

Page 2: Walking Randomly, Massively, and Efficiently

1 Introduction

Computing random walks in graphs is a fundamental algorithmic primitive. Random walks findapplications in a plethora of computer science research areas. A non-exhaustive list includes optimalPRAM algorithms for connectivity [Rei85, HZ96a, HZ96b], rating web pages [PBMW99, DMPU15],partitioning graphs [ACL06], minimizing query complexity in property testing [GR99, KKR04,CMOS19, CS10, KS11, NS10, CPS15, CKK+18], finding graph matchings [GKK13], generatingrandom spanning trees [KM09], and counting problems [JS96].

Intuitively, computing random walks, especially independent random walks from all vertices,should be highly parallelizable, since random walks are memoryless. However, even if we start asingle random walk from each vertex, after a constant number of steps many of the walks may collidein the same vertex. This is especially problematic in directed graphs, since it is not known howto precompute the vertices where many collisions would happen, other than by simulating length-lwalks in l steps or computing the l-th power of the transition matrix, which takes quadratic space.

The focus of this work is on generating a large number of independent random walks in a parallelsetting. For undirected graphs, we take advantage of the fact that the stationary distribution isproportional to vertex degrees and can thus be computed in a trivial way. If the starting points ofthe random walks are distributed according to the stationary distribution, then after any numberof steps the distribution of the endpoints is the same. We exploit this to pre-sample continuationsof random walks and recursively stitch them together in order to generate length-l walks in O(log l)parallel rounds.

The situation becomes more complicated for directed graphs (or Markov chains such as the onedefining PageRank), since we do not know the stationary distribution a priori and hence cannotapply this approach directly. Instead, to compute PageRank, we start from the undirected closureG of the input graph G, for which we can generate random walks, using the ideas described above.We then slowly transition from G to G, and gradually update our approximation of the stationarydistribution. Roughly speaking, at each step we consider a convex combination of the transitionmatrices of G and G. This technique, together with its analysis, is the most important and complextechnical contribution of the paper.

Another challenge in computing random walks in directed graphs is the fact that the probabilitiesof some vertices in the stationary distribution π can be as low as O(1/2n). Hence, computingt(v) ∼ π(v) random walks from each vertex v, where t(v) ≥ 1, would result in exponentially manywalks. This challenge can be addressed by using the Markov chain that is used to define PageRank.In this Markov chain, each random walk ending at v is either extended with a random outedge of v(with probability 1− ǫ), or jumps to a random vertex of the graph (with probability ǫ). This smallchange influences the stationary distribution π significantly, since in the modified graph we haveπ(v) ≥ ǫ/n for each vertex v. At the same time, by setting ǫ = O(1/l), one can guarantee that eachrandom walk does not make a random jump with constant probability, and is thus a random walkin the original graph. Altogether, our ideas lead to an algorithm for computing length-l randomwalks in directed graphs in O(log2 log n + log2 l) rounds and an algorithm for PageRank that usesO(log2 log n) rounds. Both these algorithms require total space that is almost linear in the inputsize (assuming we only compute random walks of length l = poly log n).

We show that our algorithms can be implemented in the Massively Parallel Computation (MPC)model, which has been extensively studied by the theory community in the recent years [LMSV11,ANOY14, CŁM+18, ABB+19, BFU18, GGK+18, GU19, Ona18, GKMS18, GLM19, HLL18, ASW19,ASS+18, ACK19, BHH19, GKU19, BDE+19]. We use the most challenging space-per-machine

1

Page 3: Walking Randomly, Massively, and Efficiently

regime of the MPC model, in which the space available on each machine is strongly sublinear in thenumber of vertices of the graph, i.e., at most n1−Ω(1). This allows for handling large graphs that donot fit onto a single machine, even if they are sparse, which is the case for social networks and thewebgraph.

1.1 Our Results

We give new algorithms for sampling independent random walks and show that they can be efficientlyimplemented in the MPC model. For the formal description of the model see Section 2.1.

The first result is an algorithm for sampling random walks in undirected graphs.

Theorem 1. Let G be an undirected graph and C ≥ 1. Let l be a positive integer such thatl = o(S), where S is the available space per machine. There exists an MPC algorithm that samplesdeg+(v)⌈C lnn⌉ independent random walks of length l starting in v for each vertex v in G. Thealgorithm runs in O(log l) rounds and uses O(Cml log l log n) total space and strongly sublinearspace per machine. If the algorithm has to return only the endpoints of each random walk, the totalspace complexity can be reduced to O(Cml log n) and l can be arbitrarily large. The algorithm is an

imperfect sampler (see Definition 8) that does not fail with probability 1− n−C6+1.

Our algorithm assumes that the length of each random walk is at most the space available permachine, which is nγ for γ ∈ (0, 1). We believe this assumption is not limiting, since in applicationsthe most interesting regime, l = no(1), especially l = O(poly log n).

One of the main results of this paper is an algorithm for sampling random walks in directedgraphs. To the best of our knowledge, this is the first o(l)-round algorithm for sampling length-lindependent random walks in any distributed or parallel model, other than the trivial algorithmbased on matrix exponentiation, which requires quadratic space.

Theorem 2. Let G be a directed graph. Let D and l be positive integers such that l = o(S)/ log3 n,where S is the available space per machine. There exists an MPC algorithm that samples Dindependent random walks of length l starting in v for each v in G. The algorithm runs inO(log2 log n+ log2 l) rounds and uses O

(

m+ n1+o(1)l3.5 +Dnl2+o(1))

total space and strongly sub-linear space per machine. The algorithm is an imperfect sampler (see Definition 8) that does notfail with probability 1−O(n−1).

We also show that the algorithm for undirected graphs is almost optimal under the 1-vs.-2-

Cycles conjecture, which is the most popular conjecture for showing conditional hardness in theMPC model.

Theorem 3. Let γ ∈ (0, 1) be a constant. In the MPC model with O(n1−γpoly log n) machines,each having space O(nγ), the following holds. Each algorithm that can sample Θ(log n) independentrandom walks of length Θ(log4 n) starting at each vertex of the graph requires Ω(log log n) rounds,unless the 1-vs.-2-Cycles conjecture does not hold.

By using our random walk sampling primitive, we give an algorithm for computing PageRankin undirected graphs.

Theorem 4. Let α ∈ [1/n, 1/4] and ǫ ∈ [log n/o(S), 1], where S is the available space per machine.There exists an MPC algorithm that, with probability at least 1− 5

n2 , computes a (1+α)-approximatePageRank vector in undirected graphs with jumping probability of ǫ in O (log log n+ log 1/ǫ) rounds

using O(

m log2 n log lognǫ2α2

)

space.

2

Page 4: Walking Randomly, Massively, and Efficiently

Our next result is an algorithm for computing PageRank in directed graphs. This is by far themost technically involved part of the paper. In fact, our algorithm for sampling random walks indirected graphs is a corollary from the lemmas that we develop to obtain the following result.

Theorem 5. Let G be a directed graph. Let α ∈ [1/n, 1/4] and ǫ ∈ [log3 n/o(S), 1], where S is theavailable space per machine. There exists an MPC algorithm that, with probability at least 1−O( 1n ),

computes a (1 + α)-approximate ǫ-PageRank vector of G in O(

log2 log n+ log2 1/ǫ)

rounds, using

O(

mα2 + n1+o(1)

ǫ3.5α2

)

total space and strongly sublinear space per machine.

This gives an exponential improvement in the number of rounds with respect to the previouslyknown results [DMPU15].

Recently, it was shown that computing Θ(1)-approximate maximum matching, Θ(1)-approximateminimum vertex cover, and maximal independent set admit Ω(log log n) conditional lower boundon the round complexity in the MPC model [GKU19] in the regime with strongly sublinear spaceper machine. Hence, obtaining O(poly(log log n))-round algorithms seem to be the new complexitybenchmark to reach.

Random Walks in PRAM. We show that our algorithm for sampling random walks is genericenough to yield interesting results beyond the MPC model. In particular, we show that it canalso be implemented in PRAM. Note that the following theorem gives a nontrivial result wheneverl = ω(poly log n).

Theorem 6. Let G be a directed graph and 1 ≤ l ≤ n. There exists an NC algorithm that usesO((n+m)1+o(1)) processors and samples one random walk from each vertex of G. All sampled walksare independent. The algorithm is an imperfect sampler (see Definition 8) that fails with probability1−O(n−1).

We could also formulate a similar theorem for computing PageRank, but an NC algorithm forPageRank is already known, since it can be obtained by simply using power method.

Applications to Property Testing. We show how to use our random walk algorithm to ap-proximately test bipartiteness and expansion. Instead of solving the exact version of the problems,we consider relaxed versions of these problems known from property testing. Our algorithms ei-ther show that the graph is close to having a given property or far away from satisfying it. Itis unlikely that the exact versions of these problems have o(log n)-round algorithms, due to the1-vs.-2-Cycles conjecture (see Section 4.4 for conjecture statement).

We now sketch the reductions. For testing expansion it suffices to observe that to solve theproblem from the 1-vs.-2-Cycles conjecture, it suffices to determine whether the expansion ispositive. For bipartiteness testing, we observe that if we start with two cycles and independentlyreplace each edge with a path of length 2 with probability 1/2, then with constant probability, weobtain a graph that has an even number of edges overall, but is not bipartite. This will never occurin this transformation if the input is a single cycle. Hence if we could exactly check if a graph isbipartite, we could distinguish inputs that are a single cycle from those that are a disjoint union oftwo cycles.

• Testing Bipartiteness (Section 7). We show how to use our random walk algorithm fortesting bipartiteness. In this promise problem, we are given a graph G on n vertices with m

3

Page 5: Walking Randomly, Massively, and Efficiently

edges and a parameter ǫ ∈ (0, 1). We want to distinguish the case that G is bipartite fromthe case that at least ǫm edges have to be removed to achieve this property. Our parallelalgorithm first reduces the size of the input graph by sampling, and then combines propertytesting techniques for bipartiteness [GR99, KKR04] with our simulation of random walks.Similar ideas were used in [CFSV16] for the CONGEST model.

• Testing Expansion (Section 8). We say that a graph G of maximum degree d is ǫ-farfrom every α⋆-vertex-expander if it is needed to change (add/remove) more than ǫdn edges ofG so that the obtained graph is α⋆-vertex-expander. Our algorithm gets α as its input, andreturns Accept if G is an α-vertex-expander, or returns Reject if G is far from being suchan expander. The idea of the algorithm is as follows. From each vertex, for O(poly log n)many times we run a pair of “long” random walks. The collision probability of thus generatedrandom walks can be used for approximately testing expansion in O(log log(n/ǫ)) rounds. Ourstarting point here is the analysis of random walk collision probability, introduced by Czumajand Sohler [CS10].

1.2 Previous Research

Random walks in the streaming model. The problem of generating random walks was con-sidered in a number of streaming and parallel computation papers. The paper of Sarma, Gollapudi,and Panigrahy [DGP11] introduced multi-pass streaming algorithms for simulating random walks.For a single starting point, they can, for instance, simulate single a length l random walk in O(n)space and O(

√l) passes. The paper by Jin [Jin19] gives algorithms for generating a single random

walk from a prespecified vertex in one pass. For directed graphs the algorithm requires roughlyΘ(nl) space and for undirected graphs Θ(n

√l) space.

Parallel distributed computation. Bahmani, Chakrabarti, and Xin [BCX11] give an MPCalgorithm for constructing length-l random walks in directed graphs. Their algorithm runs inO(log l) rounds, but the walks it generates are not independent.

A recent result by Assadi, Sun and Weinstein [ASW19] gives an MPC algorithm for detectingwell-connected components in small space per machine and with exponential speed-up over the directexploration. As a subroutine, the paper presents an algorithm for generating random walks in anundirected regular graph. Note that in regular graphs the stationary distribution is uniform andthus the problem of generating random walks becomes somewhat simpler. Let us also mention thatthere exist random walk generating algorithms for the distributed CONGEST model [DNPT13].These algorithms require however a number of rounds that is at least linear in the diameter, whichcan be Ω(log n) even for expanders.

Computing random walks has also been used in PRAM model as a subroutine in algorithmsfor computing connected components using a near-linear number of processors [KNP99, HZ96a].However, in both these algorithms random walks starting at different vertices are not independent.

PageRank. Since it was introduced in [BP98, PBMW99], the computation of PageRank has beenextensively studied in various settings. We refer a reader to [Ber05a, LM04, DSB09] for the earlydevelopment and theoretical foundations of PageRank. [DGP11] consider PageRank approximationin the streaming setting. As their main result, they show how to compute an l-length random walkin O(

√l) passes by using O(n) space. By using this primitive, [DGP11] show how to approximate

4

Page 6: Walking Randomly, Massively, and Efficiently

PageRank for directed graphs in O(M3/4) passes by using O(nM−1/4) space, where M is the mixingtime of the underlying graph.

By building on [ALNO07], [DMPU15] studied PageRank in a distributed model of computation.

For directed graphs, they design a PageRank approximation algorithm that runs in O(

lognǫ

)

, where

ǫ is the jumping probability, i.e., 1− ǫ is the damping factor. To achieve this bound, they use the

fact that w.h.p. a random walk from a vertex jumps to a random vertex within O(

lognǫ

)

steps.

Moreover, to estimate PageRank it suffices to count the number of random walks ending at a givenvertex while ignoring how those random walks reached at the vertex. The authors of [DMPU15]exploited this observation to show that many PageRank random walks can be simulated in parallel

while not over-congesting the network. This approach can be implemented in O(

lognǫ

)

MPC rounds.

The authors also show how to extend the ideas of [DGP11] and obtain an algorithm for undirected

graphs that approximates PageRank in O(√

lognǫ

)

rounds.

Another line of work considered approximate PageRank in the context of sublinear time algo-rithms, e.g., [CGS04, BP11, BBCT12, BPP18] just to name a few. This research culminated in aresult of [BPP18] who show that a PageRank of a given vertex can be approximated by examiningonly O

(

minm2/3∆1/3d−2/3,m4/5d−3/5)

many vertices/arcs, where ∆ and d are the maximum andthe average degree, respectively. It is not clear how to simulate this approach in MPC efficientlyfor all vertices, in terms of both round and total space complexity.

Bahmani, Chakrabarti, and Xin [BCX11] show how to use their result for generating randomwalks to get a constant additive approximation to Personalized PageRank of each vertex, whichcan be used to obtain a constant additive approximation to PageRank.1 Note that in the case ofPageRank constant additive approximation is a weak guarantee, since an all-zero vector provides aconstant additive approximation for all but O(1) vertices. In particular, the algorithm of [BCX11]provides non-zero estimates for only O(log n) vertices. In this paper, we show how to compute amultiplicative approximation of PageRank, and hence provide an estimate for each vertex.

We now comment why using random walks generated by the algorithm of [BCX11] would requireat least quadratic space to obtain multiplicative approximation to PageRank (assuming a standardapproach). To compute a length-l random walk from each vertex, the algorithm of [BCX11] samplesO(l) random edges from each vertex. Then, these random edges are used to construct the desiredrandom walks by doubling, that is two walks of length 2i are concatenated to form a walk of length2i+1. The same random edge sample can be used for multiple walks, which results in the followingundesirable behavior.

Consider a star-graph with the center being vertex s and use the algorithm of [BCX11] toconstruct T random walks of length O(l) from each vertex. Let W be the collection of these walks.Each of the walks in W either starts at s or visits s directly after the first step. Since the algorithmsamples O(l · T ) edges from s, there exists a set of vertices V ′ of size O(l · T ), such that except forstaring vertices, all vertices of all random walks belong to V ′. Hence, for l · T = o(n) there exists avertex different than s that among the walks of W appears Ω(nT/(l ·T )) = Ω(n/l) many times, andalso a vertex different than s that appears only T times (but only as the starting vertex of some ofthe walks).

A standard approach to estimating the PageRank of vertex v consists in counting the number ofoccurrences of v in the endpoints of walks in W . To estimate PageRank, the value of l is set to beO(log n). So, unless T = Ω(n/ log n), there exist two vertices u and v, both different from s, whose

1Compared to PageRank, in the Personalized PageRank of vertex v a random jump is always performed to v.

5

Page 7: Walking Randomly, Massively, and Efficiently

visit counts differ by a super-constant multiplicative factor. However, u and v are symmetric andtheir PageRanks, both personalized and non-personalized, are equal.

2 Preliminaries

In this paper we consider directed multigraphs, that is we allow multiple edges between each pairof vertices. Let G = (V,E) be a directed multigraph. We use deg+G(v) and deg−G(v) to denote thenumber of outgoing and incoming edges of v respectively. We also define degG(v) := deg−G(v) +deg+G(v). The subscript G is often omitted if it is clear from the context.

A graph G = (V,E,w) is weighted if w : E → R is a function assigning real weights to the edges.Note that different edges between the same pair of vertices may have different weights. We extendthe definitions for unweighted graphs to weighted graphs in a natural way.

For a weighted graph G = (V,E,w) and s ∈ R we define s · G (often abbreviated as sG) tobe a graph G′ = (V,E,w′), where w′(e) := s · w(e) for each e ∈ E. For two weighted graphsG1 = (V,E1, w1) and G2 = (V,E2, w2), we define G1 + G2 := (V,E1 + E2, w1.w2). Here, E1 + E2

denotes the multiset sum of E1 and E2, i.e., an element of cardinality c1 in E1 and c2 in E2 hascardinality c1 + c2 in the sum. The weights w1.w2 : E1 + E2 → R are defined as

(w1.w2)(e) =

w1(e) if e ∈ E1,

w2(e) if e ∈ E2.

If G1 = (V,E1) and G2 = (V,E2) are unweighted, then G1 +G2 = (V,E1 + E2).The transpose of a graph G = (V,E) is a graph GT = (V,ET ) where ET = vu | uv ∈ E.

A graph is called undirected if it is equal to its transpose. We define the undirected closure of G,denoted by G to be G = G+GT . Note that deg+

G(v) = degG(v).

We call a weighted graph G = (V,E,w) stochastic if all edge weights are nonnegative and foreach v ∈ V we have

vx∈E w(vx) = 1.2 Observe that if G1 = (V,E1, w1) and G2 = (V,E2, w2) arestochastic x, y ≥ 0 and x+ y = 1, then xG1 + yG2 is stochastic as well.

For a stochastic graph G = (V,E,w), we define T (G) : V × V → R to be the transition matrixof G, where T (G)u,v is the total weight of edges uv. For a transition matrix T (G), a stationarydistribution is a probability distribution π : V → R, such that π · T (G) = π. Note that thestationary distribution may not be unique. A stationary distribution of a stochastic graph is astationary distribution of its transition matrix.

We say that a vertex v ∈ V is dangling if deg+G(v) = 0. For an unweighted graph G = (V,E)with no dangling vertices we denote by RW(G) the weighted graph obtained by assigning a weightof 1/deg+G(v) to each outgoing edge of v. Note that RW(G) is stochastic.

A walk in a graph G is a sequence of edges u1v1, u2v2, . . . , ukvk, such that for 1 ≤ i < k,vi = ui+1. The length of a walk is the number of edges it consists of.

For a stochastic graph G = (V,E,w), a random walk in G of length k starting in s ∈ V is a walkW = u1v1, u2v2, . . . , ukvk in G, where s = u1, which is constructed with the following algorithm.For each 1 ≤ i ≤ k, the edge uivi is chosen independently at random among all outgoing edges ofui. The probability of choosing a particular outgoing edge is equal to the edge weight.

2Note that we slightly abuse notation and each outgoing edge of v corresponds to a separate summand, even in

the presence of parallel edges.

6

Page 8: Walking Randomly, Massively, and Efficiently

2.1 The MPC Model

In this paper we will work with the computational model introduced by Karloff, Suri, and Vassil-vitskii [KSV10] and refined in later works [GSZ11, BKS13, ANOY14]. We call it massively parallelcomputation (MPC), which is similar to the name in Beame et al. [BKS13].

This model captures main aspects of modern parallel systems, where we have M machines andeach of them has S words of space. The total space available on all the machines should not bemuch higher than the size of the input. In the case of graph algorithms studied here the input is acollection E of edges and each machine receives approximately |E|/M of them at the very beginning.

In MPC model the computation proceeds in rounds. During the round, each machine firstprocesses its local data without communicating with other machines. Then machines exchangemessages. When sending a message the machine specifies unique recipient of this message. Moreover,we require that all messages sent and received by each machine in each round fit into local memoryof this machine. Hence, their total length of these messages is bounded by S.3 This implies thatthe total communication of the MPC model is bounded by M · S in each round. The messages putinto recipients local space and can be processed by them in the next round.

At the end of the computation, machines collectively output the solution, i.e., the solution isformed by the union of the outputs of all the machines. The data output by each machine has tofit in its local memory. Hence again, each machine can output at most S words.

Possible Values for S and M . Typically in MPC models, one assumes that S is sublinear inthe input size N . In such case M ≥ N/S. Formally, one usually considers S = Θ

(

N1−ǫ)

, for someǫ > 0.

In this paper, the focus is on graph algorithms. By n we denote the number of vertices in thegraph, and m is the number of edges. The input size is O(n + m), where m can be as large asΘ(

n2)

.The algorithms presented in this paper use total space which is almost linear in the input size,

that is M · S = (m + n)1+o(1). The value of S, which is the amount of space available on eachmachine, is strongly sublinear in the number of vertices in the graph, that is they use O(nγ) spaceper machine for a constant γ ∈ (0, 1). This range of parameters is the most interesting one due tothe popular 1-vs.-2-Cycles conjecture that implies that many graph problems cannot be solvedin o(log n) rounds (see Section 4.4). Our algorithms work for any 0 < γ < 1 so for simplicity weomit this parameter in our theorems. However, we specify only the total space that needs to beavailable on all the M .

Communication vs. Computation Complexity. The main complexity measure in this workwill be the number of rounds required to finish computation, i.e., communication rounds are typicallythe most costly component in the MPC computation. Also, even though we do not make an effortto explicitly bound it, it is apparent from the design of our algorithms that every machine performsO(S polylog S) computation steps locally. This in particular implies that the overall work across allthe machines is O(r(n+m) polylog n), where r is the number of rounds. The total communicationduring the computation is O(r(n+m)) words.

3This for instance allows a machine to send a single word to S/100 machines or S/100 words to one machine, but

not S/100 words to S/100 machines if S = ω(1), even if the messages are identical.

7

Page 9: Walking Randomly, Massively, and Efficiently

2.2 PageRank

Let G be a directed graph and let ǫ ∈ (0, 1). PageRank [BP98] is the stationary distribution ofthe following Markov chain on the vertices of G. At a given vertex v, with probability ǫ, the nextvertex is selected uniformly at random from the set of all vertices of G, and with probability 1− ǫ,it is selected uniformly at random from the heads of outedges of v. Note that PageRank is unique,because the Markov chain described above is ergodic. 1− ǫ is called the damping factor.

2.3 Relevant Concentration Bounds

Throughout the paper, we will use the following well-known variants of Chernoff bound.

Theorem 7 (Chernoff bound). Let X1, . . . ,Xk be independent random variables taking values in

[0, 1]. Let Xdef

=∑k

i=1Xi and µdef

= E [X]. Then,

(A) For any δ ∈ [0, 1] it holds P [|X − µ| ≥ δµ] ≤ 2 exp(

−δ2µ/3)

.

(B) For any δ ∈ [0, 1] it holds P [X ≤ (1− δ)µ] ≤ exp(

−δ2µ/2)

.

(C) For any δ ∈ [0, 1] it holds P [X ≥ (1 + δ)µ] ≤ exp(

−δ2µ/3)

.

(D) For any δ ≥ 1 it holds P [X ≥ (1 + δ)µ] ≤ exp (−δµ/3).

3 Overview of Our Techniques

In order to speed up the generation of a random walk, one may be tempted to generate in parallel itsdifferent sections and then stitch them together. This approach becomes challenging with limitedspace if, for instance, one wants to generate a large number of random walks that all start from thesame vertex. Unfortunately, until we know where the walks are after k steps, it is difficult to limitthe number of continuations corresponding to the consecutive steps—k + 1, k + 2, and so on—thatwe have to consider. This seems to be an issue that many, if not all, attempts at generating randomwalks with limited space encounter [DGP11].

However, if starting points of random walks are sampled from the stationary distribution, thedistribution after any number of steps is the same. (This observation was previously used byCensor-Hillel et al. [CFSV16] to construct bipartiteness testers in constant-degree graphs for theCONGEST model.) Hence we can sample only slightly more mid-points from the same stationarydistribution and recursively generate random walks from them. The key observation is that thestationary distribution in undirected graphs is known in advance, i.e., it is proportional to vertexdegrees. This approach enables generating many fully independent random walks of length l fromeach vertex in Θ(log l) rounds of MPC using space near-linear in the size of the input. We presentthis idea in detail in Section 4. We also argue that this round complexity is tight under the 1-vs.-

2-Cycles conjecture (see Section 4.4).

3.1 Random Walks and PageRank in Directed Graphs

The problem of generating random walks is much more challenging in directed graphs. We do notknow a priori a stationary distribution, so it is not possible to directly apply the previous approach.Namely, our sampler for undirected graphs crucially uses the fact that we know the stationary

8

Page 10: Walking Randomly, Massively, and Efficiently

distribution in advance and the probabilities in this distribution are not too small. This is becausethe number of random walks sampled from each vertex has to be both proportional to the stationarydistribution and large enough to obtain concentration guarantees.

In general directed graphs these assumptions do not hold. Some vertices in the stationarydistribution may have small or even zero probability. One can also show that even an exponentialnumber of samples (i.e., 2Θ(n)) may not be enough to estimate the stationary distribution for somevertices. Hence, even if we knew the stationary distribution π and wanted to compute t(v) randomwalks from each vertex v, such that t(v) ∼ π and t(v) ≥ 1, we may need to compute exponentiallymany walks altogether. In fact, in some computational models, there is a known separation betweenthe difficulty of generating random walks for directed and undirected graphs [Jin19].

If the outdegrees in the graph are bounded by poly log n we can use the following simple ap-proach (see Appendix A for a more detailed description). For each vertex v, we find all verticeswhose distance from v is at most ǫ log n/ log log n. Once we know this set, we are able to simulateǫ log n/ log log n steps of a random walk in a single round. At the same time the assumption on theoutdegrees allows us to bound the total space usage. This approach does not generalize to the casewhen the outdegrees are arbitrary or we want to generate random walks of length ω(log n).

Instead of dealing with the problem of sampling random walks in directed graphs, we first con-sider the specific case of computing PageRank. The starting point of our approach is the algorithmof [Bre02] that we review in Section 5. At a high level, the algorithm boils down to computingO(log n/ǫ) random walks of length O(log n/ǫ) in a graph Gǫ, which is defined by

Gǫ = (1− ǫ)RW(G) +ǫ

nJ,

where J is complete directed graphs with self loops. In other words, we consider walks that withprobability 1 − ǫ perform a single step of a random walk on G, and with probability ǫ jump to arandom vertex in G. Note that PageRank is exactly the stationary distribution of Gǫ.

Note that in order to make Gǫ well defined, we need to assume that G does not contain danglingvertices, i.e., vertices without outedges. The usual approaches for handling graphs with danglingvertices are discussed in Appendix B. In particular, we show that the two most typical approaches:adding self-loops and restarting the walks, are equivalent up to a simple transformation. To oursurprise this relation was not previously observed, e.g., [DCGR05] argues why one method is betterthan the other.

Using Gǫ instead of G for sampling random walks resolves one of the challenges. Namely, weshow that in the stationary distribution of Gǫ the probability of each vertex is at least ǫ/n. Hence, bysampling Θ(n log n) random walks we are actually able to approximate the stationary distribution.This observation was already used by Das Sarma et al. [DMPU15] to obtain an O(log n)-roundalgorithm for directed and O(

√log n) round algorithm for undirected graphs.

However, we are still left with the other difficulty: we do not know the stationary distribution.We overcome it by using a novel sampling technique, which is the main technical contribution ofthe paper and may be of independent interest. In the remaining part of this section we give anoverview of the technique.

The core part of our algorithm is a procedure for adjusting sampling probabilities. Let D1 andD2 be two discrete probability distributions that are, roughly speaking, similar, i.e., the elementaryevents are assigned similar probabilities in both distributions. The procedure, given a randomsample from D1 either produces a sample from D2 or fails. As long as D1 does not differ muchfrom D2, the failure probability is small. We give two implementations of the procedure, each of

9

Page 11: Walking Randomly, Massively, and Efficiently

which is well suited for some specific distributions D1 and D2. The easier implementation is basedon rejection sampling. This means that the procedure has the property that it either returns thesample it gets or fails. As a result, the procedure never actually needs to sample from any of thedistributions D1 or D2. It actually only needs to make a single Bernoulli trial in order to decidewhether to fail. This is a very useful property, as in our case sampling a random walk is much moreexpensive than doing a Bernoulli trial.

Let us now describe our sampling technique using the above idea. Recall that G denotes theundirected closure of G. We note that by using Algorithm 1 we can efficiently sample PageRankwalks in G. Hence we are going to first sample PageRank walks in G and then gradually shifttowards PageRank walks in G by adjusting transition probabilities.

The main technical challenge is to overcome the fact that even small changes to the transitionprobabilities can cause large changes to stationary distribution. Small changes can be amplifiedΘ(n) times by the network structure, so that we cannot use stationary distribution computed atone step for sampling walks in the following step even if the difference is very similar. Insteadwe reinterpret walks directly using the procedure for adjusting sampling probabilities, i.e., in eachstep we sample slightly more walks and then reinterpret them as walks for modified graph. In thisprocess we lose some fraction of walks. Finally, we use the resulting walks to estimate the stationarydistribution for the next step.

Let us now describe this idea more formally. Consider a sequence G = G1, G2, . . . , Gk = G ofgraphs, where, informally speaking, each Gi is a mixture of G and G. (In the algorithm we aregoing to use k = Θ(1/ log log n).) Formally, Gi = (k − i)/(k − 1)G + (i− 1)/(k − 1)G.

Our algorithm computes PageRank walks in Gi for i = 1, . . . , k. The first step is easy: as wenoted above, computing PageRank walks in G1 can be done using Algorithm 1. In each of theremaining steps the algorithm starts with PageRank walks in Gi. Then, it uses the procedurefor adjusting sampling probabilities to obtain PageRank walks in Gi+1. However, the number ofrandom walks in Gi+1 obtained this way is significantly smaller than the number of walks in Gi

that we had at the beginning of the step. Still, the number of walks is large enough to estimatethe stationary distribution of PageRank walks in Gi+1. The algorithm then uses this estimatedstationary distribution to compute more PageRank walks in Gi+1, which allows the process tocontinue.

Intuitively, the first steps of this process are the most challenging due to potential existence ofvertices, whose degrees in G and G differ significantly. As an example, consider a vertex v in Gwith a single outgoing edge vw and n− 1 incoming edges. A random walk in G of length 1 startingat v (that is, a random edge incident to v) is the edge vw only with probability O(1/n). Hence, therejection sampling would fail with a very large probability.

To alleviate that, we first transform our input graph into a directed graph such that for eachvertex v it holds c · deg+(v) ≥ deg(v), where c is a constant that we set later. We call such graphsc-balanced. This transformation is explained in Section 5.2. We show how to compute PageRank ofc-balanced graphs in Section 5.1.

In the process of transforming an input to a c-balanced graph, each edge is replaced by a pathof length log n. This means that a PageRank walk of length k in the input graph correspondsto a PageRank walk of length k log n in the transformed c-balanced one. As a result, informallyspeaking, PageRank walks in a c-balanced graphs for the jump factor ǫ jump to random verticesmore often then the corresponding walks in the original graph. Hence, if we applied the algorithmunchanged to the transformed algorithm, we would effectively compute (ǫ log n)-PageRank walks.

10

Page 12: Walking Randomly, Massively, and Efficiently

A natural idea is to run our algorithm for ǫ′ = ǫ/ log n, but this would increase the roundcomplexity or space usage of our algorithm significantly. Instead, we reuse the idea of graduallyadjusting the transition matrix that we use for sampling random walks. We start with a jump factorof 1/2 and then move towards ǫ/ log n in steps, in that way ensuring that the space requirementand round complexity is as desired. This approach is presented in Section 5.3.

Once we obtain PageRank walks for a c-balanced graph with respect to jump factor ǫ/ log n, inSection 5.4 we show how to map those walks back to the input graph.

Finally, we observe that the algorithm for computing random walks in Gǫ can be used to computerandom walks in G. Assume we are interested in random walks of length l. Then, each walk inGǫ for ǫ = 1/l does not contain any random jumps with constant probability and is thus a randomwalk in G. Hence, by sampling sufficiently many random walks in G1/l we are able to computerandom walks in G. The key property here is that the round complexity of sampling random walksin Gǫ is O(log2(1/ǫ) + log2 log n), so as long as l = poly log n, the overall round complexity isO(poly log log n).

4 Sampling Random Walks

In this section we show algorithms for sampling random walks from a given stochastic graph G. Thealgorithms require that we know (at least approximately) the stationary distribution of G. Thisis easy in the case when we deal with undirected graphs, that is G = RW(GU ), where GU is anundirected graph. In this case the stationary distribution of G is given by

π(v) =deg+G(v)

2m. (1)

Hence, if we sample the starting point of a random walk from from π, then after any number of stepsthe endpoint will follow distribution π as well. This allows us to use doubling. Since the numberof random walks ending in each vertex is (in expectation) the same as the number of random walksstarting in each vertex, we can pair up these random walks and stitch together each pair of walksof length k into one walk of length 2k. The pseudocode of our algorithm is given as Algorithm 1.

Algorithm 1 Given G, l, t0, . . . , t⌈log l⌉, sample t⌈log l⌉(v) random walks of length l according to Tfrom each v ∈ V (G).

1: function RandomWalks(G, l, t)2: for all v ∈ V (G) in parallel do3: Generate t0(v) length 1 random walks in G staring in v. Let W0(v) be the set of these

walks.4: for i← 1 . . . ⌈log l⌉ do5: for all v ∈ V in parallel do6: Select ti(v) random walks from Wi−1(v). Let that set be Ui(v).7: For each walk w ∈ Ui(v), consider its endpoint u. Ask u to extend w by a yet unused

walk from Wi−1(u) \Ui(u). Let Wi(v) denote the set of all these extended walks originating atv. If u does not have unused walks anymore, the algorithm fails.

8: For each v ∈ V truncate walks in W⌈log l⌉(v) to length l.9: Return W⌈log l⌉(v) for each v ∈ V

11

Page 13: Walking Randomly, Massively, and Efficiently

The algorithm takes the following parameters. G is the input graph, which has to be stochastic,l is the desired walk length and ti(v) controls the number of walks starting in vertex v that wewould like to sample in the i-th iteration of the algorithm. Note that the algorithm requires thatti has certain properties, e.g., for undirected graphs we show that the algorithm works for ti(v)being proportional to deg+v (G). Also, for a fixed v the sequence t0(v), t1(v), . . . has to fulfill certainproperties.

In the ideal scenario for each vertex the number of random walks that start and finish in eachvertex are equal to the expected value. In such case, in each step we could match all walks intopairs and obtain two times fewer walks of twice the length. However, the numbers may diverge fromthe expected values and thus we need to sample a bit more random walks to ensure that there areenough of them with high probability. We set ti(v) = deg+G(v)⌈C log n ·ki⌉, whereas the sequence kicontrols how many more walks we sample in each step. A simple solution is to set ki = 22⌈log l⌉−2i,which implies that k⌈log l⌉ = 1 and ki = ki−1/4. By doing so, in each step, in expectation we havetwice as many walks as we really need, and it is easy to show that the number of walks is sufficientwith high probability. For the proof we are going to use fewer walks and thus slightly reduce thespace complexity.

Observe that if Algorithm 1 never failed, it would have generated independent random walks.However, when many walks collide, i.e., end in the same vertex, the algorithm is forced to fail. Thismeans that we get a random sample from a modified distribution in which the probability of someelements is decreased. This fraction on which the algorithm fails will be very small. We formalizethis notion using the following definition.

Definition 8. Let (X, p) be a discrete probability space. An imperfect sampler for (X, p) is analgorithm that returns samples from a probability space (X ∪ fail, p′), such that p′(x) ≤ p(x) forall x ∈ X. The failure probability of the sampler is p′(fail).

We are going to construct samplers where p′(fail) is arbitrarily small. Note that an algorithm Ausing a (perfect) sampler for a probability space (X, p) can be naturally translated to an algorithmA′ using an imperfect sampler. Unless the sampler fails, A′ produces the same result as A.

We now define the sequence ki that we use

k0 = 2⌈log l⌉+6 · (⌈log l⌉+ 6) (2)

ki−1 = 2ki +√

ki. (3)

The following bounds can be proven for ki.

Lemma 9. For 0 ≤ i ≤ ⌈log l⌉ we have

(i) ki ≥ 26,

(ii) ki ≤ 2⌈log l⌉−i+6 · (⌈log l⌉+ 6).

Proof. In order to show (i) we will prove that

ki ≥ 2⌈log l⌉−i+6 · (⌈log l⌉ − i+ 6). (4)

We will show (4) by induction on i.For i = 0, (4) follows by definition of k0.

12

Page 14: Walking Randomly, Massively, and Efficiently

Assume now that kj ≥ 2(⌈log l⌉)−j+6 · (⌈log l⌉ − j + 6) for each j ≤ i− 1, and we want to provethat the inequality holds for j = i as well. Recall that ki−1 = 2ki +

√ki. Towards a contradiction,

assume that ki < 2⌈log l⌉−i+6 · (⌈log l⌉ − i + 6). For the sake of brevity, define tdef

= ⌈log l⌉ − i + 6.Then, we have

ki−1 = 2ki +√

ki < 2t+1 · t+√2t · t < 2t+1 · t+ 2t < 2t+1 · (t+ 1) ≤ ki−1,

which is a contradiction. Hence, (4) holds, and (i) follows.As for (ii) we have ki−1 = 2ki+

√ki ≥ 2ki. Hence, ki ≤ 2−1ki−1 ≤ 2−ik0 = 2⌈log l⌉−i+6 · (⌈log l⌉+

6).

We now show that Algorithm 1 can be used to sample random walks in undirected graphs. Theproof is a relatively simple application of Chernoff bound.

Lemma 10. Let G be a stochastic graph, such that G = RW(GU ) for some undirected graphGU , l, C ≥ 1, l = o(S), ti(v) = deg+G(v)⌈C log n · ki⌉, where ki is given by (2) and (3). Then

RandomWalks(G, l, t) (Algorithm 1) does not fail with probability at least 1− n−C6+1.

Proof. Let Xui be the number of random walks that end at vertex u in iteration i. As long as

Xui + ti(u) ≤ ti−1(u) holds for each vertex u, the algorithm does not fail, whereas failure happens

when Xui + ti(u) > ti−1(u). Let δi be such that 1 + δi = (ti−1(u)− ti(u))/ti(u).

1 + δi =ti−1(u)− ti(u)

ti(u)=

deg+(v) · ⌈C lnn(2ki +√ki)⌉ − deg+(v) · ⌈C lnn · ki⌉

deg+(v)⌈C lnn · ki⌉

=⌈C lnn(2ki +

√ki)⌉ − ⌈C lnnki⌉

⌈C lnnki⌉

=⌈C lnn(2ki +

√ki)⌉

⌈C lnnki⌉− 1

≥ C lnn(2ki +√ki)

⌈C lnnki⌉− 1

= 1 +C lnn · √ki − 2

⌈C lnn · ki⌉

Now, by Chernoff bound, we have

P [Xvi > ti−1(v)− ti(v)] = P [Xu

i > (1 + δi)E [Xui ]] ≤ exp

(

−δ2i · ti(v)3

)

≤ exp

(

−(C lnn ·√ki − 2)2kiC lnn

3⌈C lnn · ki⌉2· deg+(v)

)

≤ exp

(

−C lnn

3 · 2

)

= n−C/6.

Where the last inequality follows from the fact that (√x − 2)2x/(x + 1)2 ≥ 1/2 for x ≥ 64. By

taking union bound over all vertices and all iterations of the algorithm the probability of failure is

less than n−C6+1.

13

Page 15: Walking Randomly, Massively, and Efficiently

Lemma 11. Assume that G, l, t, are defined as in Lemma 10. Then RandomWalks(G, l, t)(Algorithm 1) can be implemented to run in O(log l) MPC rounds using O(Cml log l log n) totalspace.

Proof. The space is bounded by max1≤i≤⌈log l⌉O(∑

v∈V ti(v) · 2i · ki)

. By Lemma 9 (ii) we have

v∈Vti(v) · 2i · ki =

v∈Vdeg+(v) · ⌈C lnn⌉ · 2i · 2⌈log l⌉−i+6 = O(Cml log l lnn).

The MPC implementation is described in Section 4.1.

4.1 MPC Implementation of RandomWalks

We now describe how to implement RandomWalks. First, we show how to implement everyiteration of the loop on Line 4 in O(1) rounds. Then, we show how to implement Line 2 also inO(1) rounds. We begin by defining primitives NumberingSublists and Predecessor.

• NumberingSublists: Given a list L of tuples, let L(x) be the sublist of L containing all thetuples whose first coordinate is x. NumberingSublists assigns distinct numbers 1 through|L(x)| to the tuples of L(x), for all values x. The numbers are assigned arbitrarily.

• Predecessor: Considered an ordered list of tuples such that each tuple is labeled by 0 orby 1. Then, for each tuple t labeled by 0 Predecessor associates the closest tuple t′ labeledby 1 such that t′ comes before t in the ordering. In [BDE+19] was proved that Predecessor

can be implemented in O(1) MPC rounds with nδ space per machine, for any constant δ > 0.

NumberingSublists in O(1) rounds. Given a list L of tuples, sort all the tuples with respectto their first coordinate. Number by j the j-th tuple in that sorted list; let j(t) be the number oftuple t. Values j(t) can be computed by prefix sums with each tuple having value 1. In [GSZ11] wasshown how to find prefix sums in O(1) rounds. Observe that this number would assign consecutivenumbers to the tuples of L(x), but not necessarily starting from 1. We now show how to shift thesenumbers so that the tuples of L(x) are numbered 1 through |L(x)|.

Define m(x) = argmint∈L(x) j(t). Label m(x) by 1, and all other tuples of L(x) label by 0. Atuple t ∈ L(x) is not m(x) if on its machine there is another tuple in L(x) with number smaller thant, or if t has the smallest number on its machine but on the previous machine there is also a tuplefrom L(x). So, to perform this labeling, it suffices that each machine sends to the next machine itstuple with the highest number. Use Predecessor to assign m(x) to each tuple of L(x). Finally,assign to each tuple t ∈ L(x) value 1 + j(t) − j(m(x)). These values are the assignments thatNumberingSublists is supposed to output.

Implementation of Line 4 of RandomWalks. Consider the i-th iteration of the loop onLine 4. Assume that so far we have Wi−1 and our goal is to obtain Wi. First, invoke NumberingSublists

to number all the walks from each Wi−1(v) with the numbers 1 through |Wi−1(v)|. For each walkw ∈ Wi−1(v), let j(w) be the number assigned to w by NumberingSublists. Create a list oftuples as follows:

14

Page 16: Walking Randomly, Massively, and Efficiently

• If j(w) ≤ ti(v), create a tuple (wlast, j(w), w), where wlast is the last vertex of w. Thiseffectively creates Ui(v).

• If j(w) > ti(v), create a tuple (v, j(w) − ti(v), w). This effectively creates Wi−1(v) \ Ui(v).

Sort all these created tuples. Now, in this sorted list, next to each other will be tuples (u, z, w1) and(u, z, w2) such that u is the last vertex of w1 and the first vertex of w2. Let v be the first vertex ofw1. We append w2 to w1 to obtain a random walk in Wi(v). Since each of these operations requiresO(1) rounds, a single iteration of Line 4 can be implemented in O(1) rounds.

Implementation of Line 2 of RandomWalks. The loop on Line 2 generates t0(v) randomedges incident to v. We implement this step in MPC as follows. First, we create a sorted array oflength

v∈V t0(v) such that this array contains t0(v) copies of v. Then, each copy of v in this arraywill be “in charge” of choosing one random edge incident to v. To store this array across machines,we think of the array being partitioned into subarrays of length S′ = Θ(S), where the i-th machineholds subarray i.

Next, we explain how to replicate vertex v t0(v) many times. Consider x =⌈∑

v∈V t0(v)/S′⌉

machines and assume that their IDs are 1 through x. For each vertex v compute X(v) =∑

u<v t0(u).For each v, if 1 + X(v) is not divisible by S′, v sends to machine ⌈(1 +X(v))/S′⌉ pair ((1 +X(v)) mod S′, v) and otherwise v sends (S′, v). In addition, to machine x we send

v∈V t0(v).Intuitively, in our big array this effectively marks the first position where the copies of v begin. Afterthis, each machine labels the received pair (y, v) by 1 and also creates (z,⊥) for each z ∈ 1, . . . , S′such that the machine did not receive pair (z, u); the exception is for machine x in which z rangesuntil the last value representing position

v∈V t0(v). Pairs (z,⊥) are labeled by 0. Within eachmachine, pairs (z,⊥) and (y, v) are sorted (they are not sorted between the machines). Next, weuse Predecessor to associate the closest (y, v) to (z,⊥). After this step, we replace (z,⊥) by(z, v). Then, each pair (z, v) samples a random integers j between 1 and d(v) and creates pair(v, j). Notice that there might be pairs (v, j1) and (v, j2) such that j1 6= j2 or there might bemultiples pairs (v, j1). Let J denote the multiset of these pairs across all machines.

Now, for each edge u, v create two pairs (u, v) and (v, u). By using NumberingSublists,assign numbers 1 through d(v) to pairs (v,w) | w ∈ N(v). Let j be the number assigned to(v,w). Create triples (v, j, w). Finally, sort all these triples together with J , where in the orderingtriple (v, j, w) comes before a pair (v, j) ∈ I. Label triples by 1 and the pairs of J by 0. UsePredecessor to assign a triple to each pair from J . If triple (v, j, w) is associated to a pair (v, j),then add (v,w) to W0(v). This concludes the implementation of Line 2.

4.2 Sampling Endpoints

In this section we show that if we only need to find the endpoint of each random walk, then the spaceusage of Algorithm 1 can be improved. Actually, this will be the case for some of our applications.In order to reduce the space requirement we observe that Algorithm 1 only looks on endpoints ofthe sampled walks. Hence, we do not need to store internal vertices of the walks in the algorithm.This alone does not suffice to improve the space usage, as the first iteration, which samples randomwalks of length 1, still achieves the peak space usage (and clearly does not benefit from not storingthe internal vertices of the random walks). However, we can first sample random walks of lengthroughly log log l using a simple simulation taking O(log log l) steps and after that continue similarly

15

Page 17: Walking Randomly, Massively, and Efficiently

to Algorithm 1. Another benefit of the algorithm is that since it does not store the entire walks,the walks that it produces can be arbitrarily long, i.e., their length is not bounded by the spaceavailable on each machine.

For simplicity of presentation in this section we assume that l is a power of 2. The new algorithmis given as Algorithm 2.

Algorithm 2 Given G, l, t1, . . . , t⌈log l⌉, sample t⌈log l⌉(v) endpoints of random walks of length l inG each v ∈ V (G).

1: function RandomWalkEndpoints(G, l, t)2: for all v ∈ V in parallel do3: Perform t⌈log log l⌉(v) length 2⌈log log l⌉ random walks in G starting at v.4: Let W⌈log log l⌉(v) be the set of endpoints of these walks.

5: for i← ⌈log log l⌉+ 1 . . . log l do6: for all v ∈ V in parallel do7: Randomly select ti(v) elements from Wi−1(v). Let that set be Ui(v).8: For vertex w ∈ Ui(v) take yet unused element from Wi−1(u)\Ui(u). Let Wi(v) denote

the set of all these elements. If u does not have unused elements, the algorithm fails.

9: Return Wlog l(v) for each v ∈ V

Lemma 12. Let G be a stochastic graph, such that G = RW(GU ) for some undirected graph GU ,l, C ≥ 1, t(v) = deg+G(v)⌈C log n⌉ and ki be given by (2) and (3). Then RandomWalkEndpoints(G, l, t)

(Algorithm 2) does not fail with probability at least 1− n−C6+1.

Proof. The proof is almost identical to the proof of Lemma 10. It suffices to observe that the firstloop of RandomWalkEndpoints effectively replaces the first ⌈log log l⌉ iterations of the loop inAlgorithm 1 by a procedure that does not fail.

Lemma 13. Assume that G, l, t are defined as in Lemma 12. Then RandomWalkEndpoints

(Algorithm 2) requires O(log l) rounds and O(Cml log n) total space.

Proof. The space is bounded by O(∑

v∈V t⌈log log l⌉(v))

and we have

v∈Vt⌈log log l⌉(v) =

v∈Vdeg+(v) · ⌈C lnn⌉⌈k⌈log log l⌉⌉

by Lemma 9 (ii)

≤ (C + 1)m lnn · 2⌈log l⌉−⌈log log l⌉+6 · (⌈log l⌉+ 6)

≤ (C + 1)m lnn2⌈log l⌉

log l· (⌈log l⌉+ 6)

= O(Cml lnn).

By combining Lemmas 10 to 13 we obtain the first result of the paper.

Theorem 1. Let G be an undirected graph and C ≥ 1. Let l be a positive integer such thatl = o(S), where S is the available space per machine. There exists an MPC algorithm that samples

16

Page 18: Walking Randomly, Massively, and Efficiently

deg+(v)⌈C lnn⌉ independent random walks of length l starting in v for each vertex v in G. Thealgorithm runs in O(log l) rounds and uses O(Cml log l log n) total space and strongly sublinearspace per machine. If the algorithm has to return only the endpoints of each random walk, the totalspace complexity can be reduced to O(Cml log n) and l can be arbitrarily large. The algorithm is an

imperfect sampler (see Definition 8) that does not fail with probability 1− n−C6+1.

4.3 Sampling Random Walks Given Approximate π

While computing the stationary distribution π is easy for undirected graphs, the problem is signifi-cantly more involved if we consider directed graphs. In this section we show how to use Algorithm 1to sample random walks in a directed graph given an approximation π of π. Our approach would alsorequire that π = Ω(1/n). In Section 5 we show how to use this sampling procedure for computingPageRank and for sampling random walks in directed graphs.

As the first step, in the next lemma we show that if π and π are close, then taking a 1-steprandom walks starting from π results in a distribution that also is also close to π.

Lemma 14. Let G be a stochastic graph and T = T (G). Let π be the stationary distribution formatrix T and let π ∈ R

V be an arbitrary vector. Then, the following holds:

(A) If |π(v)− π(v)| ≤ απ(v) for all v ∈ V , then |(T π)(v) − π(v)| ≤ απ(v) for all v ∈ V .

(B) If |π(v)− π(v)| ≤ α for all v ∈ V , then |(T π)(v) − π(v)| ≤ α for all v ∈ V .

Proof. We prove each of the claims separately.

Claim (A). Let ∆ = π − π. The statement implies that |∆(v)| ≤ απ(v). Now we have

T π = T (π +∆) = π + T∆.

Therefore,|(T π)(v)− π(v)| = |(T∆)(v)|.

Let Tv be the v-th row-vector of T . Hence, |(T∆)(v)| = |Tv∆|. We next obtain

|Tv∆| ≤∑

u∈V|Tv,u∆(u)| ≤ α

u∈V|Tv,uπ(u)|.

Using the fact that the entries of T and of π are non-negative, from the last chain of inequalitieswe derive

|Tv∆| ≤ α∑

u∈VTv,uπ(u) = αTvπ = απ(v),

as desired.

Claim (B). Similarly as above define ∆ = π − π, so T π = π + T∆. Moreover, |∆(v)| ≤ α, so

|(T π)(v)− π(v)| = |(T∆)(v)| = |Tv∆| ≤∑

u∈V|Tv,u∆(u)| ≤ α|Tv | ≤ α.

17

Page 19: Walking Randomly, Massively, and Efficiently

Note that Algorithm 1 is parameterized by G, l, t and in particular it does not take stationarydistribution as an argument. Instead, the stationary distribution which is encoded in t. To define tin our earlier analysis of RandomWalks we used ki defined by (2) and (3). Intuitively, ki definesthe overhead of the number of random walks that we have to sample from each vertex so that thealgorithm has sufficiently many random walks for the next iteration of doubling. When we do nothave access to the exact value of π, but instead to its (1 + α)-approximation, we have to use largervalues of ki to accommodate that approximation. So, for 1 ≤ i ≤ ⌈log l⌉, we define

ki = (2 + 4α)⌈log l⌉−i. (5)

Comparison between (5) and (3). Note that for α = 0 the value of ki given by (5) does notequal to the one given by (3). We made such choice so to simplify the expression of ki when we haveaccess to only an approximation of π. As a consequence of definition (5), the space requirement inour analysis depends on 1/α, which can be very large for small values of α. Nevertheless, if π is a(1 + α)-approximation of π, then π is also a (1 + α′)-approximation for any α′ ≥ α. Hence, andwithout loss of generality, we can assume that α ≥ 1/ log n.

We are now ready to analyze properties of RandomWalks when it has access to only anapproximation of π.

Lemma 15. Let G be a stochastic graph. Let π be the stationary distribution of G, and ǫ be suchthat π(v) ≥ ǫ/n for all v. Let l, C ≥ 1 and l = o(S). Finally, let α ∈ (0, 1/4] be a parameter,and let π be a probability distribution on V such that |π(v) − π(v)| ≤ απ(v) for all v. Define kias in (5) and ti(v) = ⌈Cπ(v)n lnn · ki⌉. Then, RandomWalks (G, l, t) (Algorithm 1) has thefollowing properties:

(i) The algorithm is an imperfect sampler (see Definition 8) for generating ⌈Cπ(v)n ln n⌉ length l

random walks starting from each vertex v ∈ V . The failure probability is at most n−Cαǫ3

+2e2.

(ii) The algorithm can be executed in O(log l) MPC rounds.

(iii) The space requirement of the algorithm is O(

m+Cl1+2αn lnn)

.

Proof. Property (ii) follows by Lemma 11. Property (iii) follows from the fact that the spacerequirement is dominated by the first iteration of RandomWalks. Hence, as k0 = (2 + 4α)⌈log l⌉,the space requirement is O

(

m+ C(2 + 4α)log ln lnn)

= O(

m+ Cl1+2αn lnn)

.

Property (i). At the end of i-th step of Algorithm 1, ti(v) denotes the number of random walksof length 2i−1 that originate at vertex v. At step i, each vertex v doubles ti(v) walks arbitrarilychosen from Wi−1(v). Let Xu

i be the number of these random walks ending at vertex u. Note thatXu

i is a sum of 0/1 random variables Y vi,j, where Y v

i,j equals 1 iff the j-th selected random walk ofWi−1(v) ends at u. Let T = T (G) be the transition matrix of G. From Lemma 14, and as Tπ = π,we have

(T jti)(u) = (T j−1(T ti))(u) ≤ (T j−1(T (Cπkin log n+~1)))(u) ≤ (1 + α)Cπ(u)kin log n+ 1. (6)

18

Page 20: Walking Randomly, Massively, and Efficiently

Using this upper-bound, we further derive

|E [Xui ]− ti(u)| ≤ |E [Xu

i ]− Cπ(u)kin lnn|+ 1

= |(T (2i−1)ti)(u)− Cπ(u)kin log n|+ 1

≤ |(T (2i−1)ti)(u)− C(1− α)π(u)kin lnn|+ 1

from (6)

≤ |(1 + α)Cπ(u)kin lnn− C(1− α)π(u)kin lnn|+ 2

≤ 2αCπ(u)kin lnn+ 2. (7)

As long as Xui ≤ ti−1(u) − ti(u) holds for each vertex u, a vertex u is able to (1) extend all

the Xui random walks that ended at u, and (2) double the length of ti(u) random walks from Ui(u)

staring in u.We have

ti−1(u)− ti(u) = ⌈Cπ(u)ki−1 · n lnn⌉ − ⌈Cπ(u)ki · n lnn⌉= ⌈Cπ(u) · (2 + 2α)ki · n lnn⌉ − ⌈Cπ(u)ki · n lnn⌉= ⌈2αCπ(u)ki · n lnn− 2⌉+ 2⌈Cπ(u)ki · n lnn⌉ − ⌈Cπ(u)ki · n lnn⌉≥ 4αCπ(u)ki · n lnn− 2 + ti(u)

≥ 4α(1 − α)Cπ(u)ki · n lnn− 2 + ti(u)

from (7)

≥ (2α− 4α2)Cπ(u)ki · n lnn− 4 + E [Xui ]

from α ≤ 1/4

≥ αCπ(u)ki · n lnn− 4 + E [Xui ]

≥ αCǫki · lnn− 4 + E [Xui ] ,

where the last inequality follows from our assumption that π(u) ≥ ǫ/n. Now, by Chernoff bound,it holds

P [Xui > ti−1(u)− ti(u)] ≤ P [Xu

i > E [Xui ] + αCǫki · lnn− 4]

≤ exp

(

−αCǫki · lnn− 4

3

)

≤ exp

(

−αCǫ

3lnn

)

e2 = n−αCǫ3 e2.

By taking union bound over all vertices and all rounds of the algorithm the probability of failure is

less than n−Cαǫ3

+2e2.

4.4 Lower Bound

In this section we show that our algorithm for sampling random walks in undirected graphs is con-ditionally optimal under the popular 1-vs.-2-Cycles conjecture [BKS13, RVW18]. The conjecturestates that any algorithm in the MPC model which distinguishes between a graph being a cycle oflength n from a graph consisting of two cycles of length n/2, and uses O(nγ) space per machineand O(n1−γ) machines requires Ω(log n) rounds.

19

Page 21: Walking Randomly, Massively, and Efficiently

Theorem 3. Let γ ∈ (0, 1) be a constant. In the MPC model with O(n1−γpoly log n) machines,each having space O(nγ), the following holds. Each algorithm that can sample Θ(log n) independentrandom walks of length Θ(log4 n) starting at each vertex of the graph requires Ω(log log n) rounds,unless the 1-vs.-2-Cycles conjecture does not hold.

The proof is based on the fact that by running Θ(log n) random walks of length Θ(log4 n) froma vertex v one discovers Θ(log2 n) nearest vertices to v with high probability. This property followsfrom the following well known lemma.

Lemma 16. Let t > 1 be an even integer and X1, . . . ,Xt be a sequence of i.i.d random variables,such that P (Xi = 1) = P (Xi = −1) = 1/2. Let X =

∑ti=1 Xi. Then, P (|X| ≥

√t/2) = Ω(1).

Proof. (X−t)/2 follows a binomial distribution with p = 1/2. Hence, the mode of the distribution ofX is 0, that is for every integer i, P (X = 0) ≥ P (X = i). Moreover, P (X = 0) =

( tt/2

)

(1/2)t ≤ c/√t,

for some constant c where the inequality follows from Stirling’s approximation. Hence, P (|X| ≥√t/(2c)) ≥ 1 −∑⌈

√t/(2c)⌉

i=−⌈√t/(2c)⌉ P (X = i) ≥ 1−∑⌈

√t/(2c)⌉

i=−⌈√t/(2c)⌉ P (X = 0) ≥ 1− (

√t/(2c) + 3)c/

√t =

1− 1/2 + 3c/√t = Ω(1).

Proof of Theorem 3. Let us assume that there exists an algorithm which samples Θ(log n) indepen-dent random walks of length Θ(log4 n) starting at each vertex and takes f(n) = o(log log n) rounds.Note that we allow the algorithm to use total space that is poly log n factor more than the size ofthe graph (otherwise storing the output would not be possible). We will use it to solve the problemfrom the problem from the 1-vs.-2-Cycles conjecture more efficiently.

We begin by dealing with the following technical difficulty. The 1-vs.-2-Cycles conjectureconsiders a setting, in which the total space is linear in the graph size. Thus, we first show analgorithm that in O(poly log log n) rounds shrinks the length of each cycle by a constant factor withprobability 1 − n−d, where the constant d can be made arbitrarily large. The algorithm samplesa random bit Xv for each vertex v. Then, for each tree vertices a, b, c which are adjacent on thecycle (or cycles), if Xa = Xc = 0 and Xb = 1, we connect a and c with an edge and remove bfrom the graph. It is easy to see that this indeed shrinks each cycle by a constant factor withprobability at least 1 − n−d, as long as the length of each cycle is ω(d log n). By running thisalgorithm O(log log n) times, we can shrink each cycle by a factor of Ω(poly log n). We obtain agraph having n′ = O(n/poly log n) vertices and edges, so from now on we can afford to use analgorithm whose total space usage is O(n′poly log n′), since n′poly log n′ = O(n).

We now show how to use random walks to shrink the cycles further, namely shrink them by afactor of Ω(log n) in f(n) rounds. Let us sample each vertex independently with probability 1/ log n.With probability at least 1 − n−d among every Ω(log2 n) consecutive vertices along each cycle atleast one vertex is sampled. Our goal is to use random walks to contract the graph to the set ofsampled vertices. To that end, we run Θ(log n) random walks of length Θ(log4 n) from each vertexand show that with high probability for each pair of consecutive sampled vertices on the cycle, thereexists at least one random walks which visits both of them. Consider a sampled vertex. We showthat with high probability, random walks starting at this vertex visits the two neighboring sampledvertices along the cycle. Indeed, since we have sample Θ(log n) random walks per vertex, it sufficesto show that each random walk visits one of the neighboring vertices with constant probability.This in turn follows from the fact that with constant probability, a random walk on a line of lengthΩ(t) ends in some vertex at distance

√t with constant probability (see Lemma 16).

20

Page 22: Walking Randomly, Massively, and Efficiently

As a result, in f(n) rounds we are able to contract each cycle to the set of sampled vertices,whose size is clearly O(n/ log n) with high probability. By repeating this step Θ(log n/ log log n)times, we can reduce the total number of vertices to o(nγ), at which point the remaining graph canbe sent to one machine to check whether it is a cycle or two cycles. The total number of rounds isΘ(log n/ log log n)f(n) = o(log n).

5 PageRank

In this section we show MPC algorithms for computing PageRank both in undirected and directedgraphs, that is we prove Theorem 4 and Theorem 5. As an easy corollary we obtain an algorithmfor sampling random walks in directed graphs (Theorem 2).

We are going to use the following most basic algorithm for estimating PageRank using randomwalks. Let 0 ≤ ǫ ≤ 1 be a parameter. We are going to sample random walks form a stochasticgraph

Gǫ = (1− ǫ)G+ǫ

nJ, (8)

where J is a complete directed graph on the vertex set V (G) (containing a self loop in every vertex).In other words, with probability ǫ we jump to a random vertex, and with probability 1− ǫ we walkaccording to edges of G.

Definition 17 (Jump transition). Let Gǫ be the stochastic graph defined by (8). Then, jumptransition refers to the transition performed within the graph J .

Consider the transition matrix Tǫ = T (Gǫ). The stationary distribution πǫ of Tǫ satisfies Tǫπǫ =πǫ, which implies the following equation

(I − (1− ǫ)T (G)) πǫ =ǫ

n~1.

The crucial property of πǫ is that the probabilities of ending in a given vertex do not decreasemuch relatively to decrease of ǫ, as shown by the following lemma.

Lemma 18. For any 0 < δ ≤ 1 we have

πǫ·δ ≥ δ · πǫ,where inequality is taken over all coordinates.

This result follows from the Taylor expansion of πǫ

πǫ =ǫ

n

∞∑

i=0

((1 − ǫ)T (G))i~1. (9)

Proof. Using Eq. (9) we get

πǫ·δ =ǫ · δn

∞∑

i=0

((1 − ǫδ)T (G))i~1 = δ

(

ǫ

n

∞∑

i=0

(1− ǫδ)iT (G)i~1

)

(10)

≥ δ

(

ǫ

n

∞∑

i=0

(1− ǫ)iT (G)i~1

)

= δ

(

ǫ

n

∞∑

i=0

((1 − ǫ)T (G))i~1

)

= δ · πǫ. (11)

21

Page 23: Walking Randomly, Massively, and Efficiently

The next follows from the observations that π1(v) =1n .

Corollary 19. For any 0 < ǫ ≤ 1 and any v ∈ V we have

πǫ(v) ≥ǫ

n.

The Taylor expansion Eq. (9) suggests the following algorithm for estimating PageRank (seee.g., [Bre02]).

Algorithm 3 An algorithm for approximating the PageRank with damping factor 1− ǫ by using aset of random walks W .1: function StationaryDistribution(W, ǫ)2: for all v ∈ V in parallel do3: Remove from W all but K =

9 lnnǫα2

walks starting in v. If W does not contain enoughwalks, then “fail”.

4: Truncate each walk in W just before the first jump transition (see Definition 17).5: for all v ∈ V in parallel do6: Let nv be the number of the walks from W ending in v.7: π(v)← nv

Kn .

8: Return π

Algorithm 4 An algorithm for approximating PageRank using random walks.

1: Sample a set W of K =⌈

9 lnnǫα2

random walks starting from each vertex of Gǫ of length l =⌈

9 lnnǫ

.2: Return StationaryDistribution (W, ǫ)

Lemma 20. Let ǫ ≤ 1 and assume we mark each edge of a walk of length l = ⌈9 lnnǫ ⌉ independently

with probability ǫ. The probability that no edge is marked is less than 1n9 .

Proof. (1− ǫ)l ≤ (1− ǫ)9lnnǫ ≤ e−9 lnn = 1

n9 .

Let us prove the approximation ratio obtained by Algorithm 4. Note that the lower bounds forα and ǫ are not limiting, since the interesting values for both parameters are constant.

Lemma 21. Let α ∈ [1/n, 1/4], ǫ ∈ [1/n, 1], and 0 < α < 1. Denote by π the output of Algorithm 4.Then, |πǫ(v)−πǫ(v)| ≤ απǫ(v) for all v ∈ V (i.e., π is (1+α)-approximation of πǫ) with probabilityat least 1− 4

n2 .

Proof. The probability that some walk is not truncated by Lemma 20 is 1n9 , as jump transition

happens with probability ǫ. Hence, by union bound over nK ≤ 10n5 walks some walk is nottruncated with probability at most 10

n5 ≤ 2n2 .

Observe that nv is a sum of nK 0, 1-random variables. Moreover, its expectation by Corollary 19is E [nv] ≥ ǫ

n · nK = ǫK, so by Chernoff bound (see Theorem 7(A))

Pr [|nv − E [nv]| ≤ αE [nv]] ≤ 2 exp

(

−α2E [nv]

3

)

≤ 2 exp

(

−α2ǫK

3

)

≤ 2 exp (−3 ln(n)) = 2

n3.

22

Page 24: Walking Randomly, Massively, and Efficiently

Again, by union bound over all n vertices some estimate is incorrect with probability at most 2n2 .

Hence, the total probability of failure is 4n2 .

StationaryDistribution also has an efficient MPC implementation.

Lemma 22. StationaryDistribution (Algorithm 3) can be implemented in O(1) MPC rounds.

Proof. To implement Line 2, we invoke NumberingSublists on the walks from W ; the numberingof walks is performed with respect to their starting vertices. (See Section 4.1 for details aboutNumberingSublists.) Those walks that get number larger than K are removed. This can beimplemented in O(1) rounds.

Line 4 is performed without additional communication.To simulate the loop on Line 5, on the remaining walks we again invoke NumberingSublists,

but this time we perform numbering of the walks with respect to their ending vertices. For eachvertex v, we calculate the maximum among those numbers. That maximum value equals nv. Theseoperations can be performed in O(1) rounds.

Theorem 4. Let α ∈ [1/n, 1/4] and ǫ ∈ [log n/o(S), 1], where S is the available space per machine.There exists an MPC algorithm that, with probability at least 1− 5

n2 , computes a (1+α)-approximatePageRank vector in undirected graphs with jumping probability of ǫ in O (log log n+ log 1/ǫ) rounds

using O(

m log2 n log lognǫ2α2

)

space.

Proof. We set K =⌈

9 lnnǫδ2

, l =⌈

9 lnnǫ

and C = 6ǫα2 . We first execute Algorithm 1 with C and l.

Next, we give sampled walks to Algorithm 4 with K and l.

Space requirement. By Lemma 11 we require O(Cml log l log n) = O(

m log2 n log lognǫ2α2

)

totalspace.

Round complexity. By Lemmas 11 and 22 we require O(log l) = O(log log n+ log 1/ǫ) rounds.

Success probability. On one hand, by Theorem 1 the algorithm fails with probability at most

n−C3+3 = n− 2

ǫα2 +3 ≤ n−32+3 = n−29. On the other hand, by Lemma 21 we obtain (1 + α)-approximation of π with probability at least 1− 4

n2 . Hence, the final success probability is at least1− 5

n2 .

5.1 Directed Balanced Graphs

The first step towards our algorithm for directed graphs is considering balanced directed graphs. Adirected graph G is called c-balanced when for all v ∈ V we have cdeg+(v) ≥ deg(v). In particularin c-balanced graphs there are no dangling vertices, i.e., vertices that do not have any out edge.The idea is to first consider G – the undirected closure of G. In G we can compute long walks fastand then gradually move towards directed graph G. The c-balanced property allows as to arguethat each edge from a random walk in G is with probability at least 1/c directed according to G,enabling us to prove the following result.

23

Page 25: Walking Randomly, Massively, and Efficiently

Theorem 23. let G be a c-balanced graph. Let α ∈ [1/n, 1/4], ǫ ∈ [log n/o(S), 1], and δ ∈ (0, 1]such that δ−1 ∈ N. There exists an MPC algorithm that with probability at least 1 − 12

δn2 com-putes (1 + α)-approximate ǫ-PageRank vector of G in O

(

δ−1(log log n+ log 1/ǫ))

rounds using

O(

m ln2 n ln lnnǫ2α2 + n18 cδ

ǫn ln2.5 nǫ3.5·α2

)

total space and strongly sublinear space per machine.

For this we need few more definitions. We are going to sample random walks from the stochasticgraph defined as follows

Gǫ,σ = (1− ǫ)σRW(G) + (1− ǫ)(1− σ)RW(G) +ǫ

nJ. (12)

We denote by πǫ,σ the stationary distribution of Gǫ,σ. In our algorithms we will be using Gǫ,σ, butthe graph corresponding to J will be constructed only implicitly. That is, instead of constructinggraph J explicitly, which contains n2 edges, each vertex v of Gǫ,σ will hold a value ǫ/n implyingthat v has an edge of weight ǫ/n to each vertex.

We also note that Gǫ,σ can be constructed in O(1) MPC rounds. Namely, we first broadcastǫ and σ to each machine, which can be done in O(1) rounds as described in [GSZ11]. Then, eachedge is copied and annotated so to construct (1− ǫ)σRW(G) + (1− ǫ)(1− σ)RW(G). Finally, eachvertex is annotated by ǫ which, as described, suffices to implicitly construct ǫ

nJ .

Observation 24. A stochastic graph as defined by (12) can be implicitly constructed in O(1) MPCrounds.

We now define three transition types that capture different components of stochastic graphs.

Definition 25 (Transition types). Let Gǫ,σ be the stochastic graph as defined by (12). Each edgeof Gǫ,σ originates from one of the graphs G, G and J . We call an edge of Gǫ,σ a

• directed transition, if it originates from G, and

• undirected transition, if it originates from G, and

• jump transition, if it originates from J .

In the following we assume that each edge of each walk in Gǫ,σ has its edges labeled with thetransition types defined above.

In the main algorithm of this section (see Algorithm 5), we start from σ = 1 and then graduallydecrease the value to obtain σ = 0. This sequence is defined as

σj+1 = σj − δ,

for 1 ≤ j ≤ δ−1. (We will set δ so that δ−1 is an integer.) We now state the main algorithm of thissection. It uses RandomWalks (Algorithm 1) as a subroutine.

5.1.1 Algorithms

We now describe our algorithms used to compute approximate PageRank for c-balanced graphs.The main algorithm is PageRankOfBalancedGraphs (see Algorithm 5) that essentially repeatsAlgorithm 4 (see the steps within loop Line 4) δ−1 many times. This loop implements the gradualchange from an undirected to the corresponding directed graph.

24

Page 26: Walking Randomly, Massively, and Efficiently

Algorithm 5 An algorithm for computing (1 + α)-approximate ǫ-PageRank of a c-balanced graphG.1: function PageRankOfBalancedGraphs(G, ǫ, α, δ)2: Compute approximate stationary distribution πǫ,1 using Theorem 4.3: l←

9 lnnǫ

4: for j ← 1 . . . δ−1 do5: Implicitly compute Gǫ,σj using Eq. (12).6: Run RandomWalks(Gǫ,σj , l, t) where ti(v) = ⌈Cπǫ,σj(v)n ln n · ki⌉ and ki is defined

by (5). Let W be the set of resulting walks. ⊲ C is defined in Theorem 237: WT ← ∅8: for all w ∈W in parallel do9: if wT = TranslateWalk-σ(G, ǫ, j, w) did not “fail” then

10: Add wT to WT .

11: πǫ,σj+1 = StationaryDistribution(WT , ǫ)

As the main primitive, PageRankOfBalancedGraphs invokes TranslateWalk-σ (Algorithm 6)on Line 9, which takes a PageRank walk in Gǫ,σj and either returns a PageRank walk in Gǫ,σj+1

or fails. In the following we say that it translates the given walk. Each translation can fail withrelatively large probability, but we will take enough walks so that the whole process has a smallfailure probability.

Algorithm 6 Given a random walk w in Gǫ,σj , return a random walk in Gǫ,σj+1 or “fail”.

1: function TranslateWalk-σ(G, ǫ, j, w)2: if σj < 1/2 then3: Let d be the number of directed transitions in w.4: Let u be the number of undirected transitions in w.

5: p← rlσuj+1(1−σj+1)

d

σuj (1−σj)d

⊲ r is set in Lemma 26 to guarantee that p ≤ 1.

6: With probability p return w; otherwise, return “fail”.7: else8: Let wT equal w. We annotate the transitions of wT as follows.9: for each edge e = uv of wT do

10: if e is an undirected transition in w then11: – With probability 1− ρ(v) we keep e in wT as an undirected transition. ⊲ We

define ρ(v) later in Lemma 26.12: – With probability ρ(v) do the following: If uv is an arc in G, then we replace e

in wT with this directed edge; otherwise, return “fail”.13: else if e is a directed transition in w then14: Keep e in wT as a directed transition.15: else if e is a random jump in w then16: With probability β(v) return “fail”, and otherwise keep e in wT as a random jump.

⊲ We define β(v) later in Lemma 26, (17).

17: return wT together with its transition types

In the rest of this section we analyze Algorithm 5 and show that it satisfies the claim given in

25

Page 27: Walking Randomly, Massively, and Efficiently

Theorem 5.

5.1.2 The Success Probability of Algorithm 5

We now prove the correctness of TranslateWalk-σ, which transforms a random walk w in Gǫ,σj

to a random walk in Gǫ,σj+1 , or returns “fail”. We show that the output walk is a random walk inGǫ,σj+1 and that the probability of “failing” is relatively small.

Lemma 26. Let w be a random walk of length l in Gǫ,σj . Assume that δ ≤ 1/(2c). Then,TranslateWalk-σ(w) (Algorithm 6) does not fail and outputs a random walk wT of length lin Gǫ,σj+1 with probability at least (1− 2cδ)l.

The proof analyzes two separate cases. In one of them we use the following procedure, which isoften called rejection sampling

Lemma 27. Consider two discrete probability spaces P and Q over the same space e1, . . . , en.For 1 ≤ i ≤ n, let pi and qi give the probabilities of ei in P and Q respectively. Finally, let0 ≤ r ≤ min1≤i≤n,qi 6=0 pi/qi.

Consider an algorithm which given a random sample X = ei from P returns it with probabilityqi/pi · r, and does nothing with the remaining probability. Then, the algorithm returns a result withprobability r and each time it does so, it is an element sampled according to Q.

Proof. We first note that the probability qi/pi · r is well-defined. Indeed, it is only evaluated whenXi = ei, which implies pi > 0. Moreover, thanks to the choice of r, it does not exceed 1.

The probability of the algorithm not failing is

n∑

i=1

pi · qi/pi · r =

n∑

i=1

qi · r = r.

Moreover, the probability of returning ej is pj · qj/pj · r = qj · r. Hence, if we condition on thealgorithm not failing, the probability of returning ej is qj, as desired.

We can now prove Lemma 26.

Proof of Lemma 26. For the proof we consider a procedure that translates the input walk w in lsteps, one edge at a time. We are going to show that after i steps either the procedure has failed orthe first i edges of the walk are now sampled according from Gǫ,σj+1 . We split the proof into twocases: σj < 1/2 and σj ≥ 1/2.

Case σj < 1/2. Observe that in this case TranslateWalk-σ(w) either returns the walk w orfails. Recall that we consider a procedure that translates the consecutive edges of the walk one byone. At each step we are going to use Lemma 27. Each edge of the walk is either a jump transition(with probability ǫ), directed transition (with probability (1−ǫ)σ) or an undirected transition (withprobability (1− ǫ)(1− σ).

Fix a vertex v and let e be some transition that can be chosen by a random walk that has reachedv. Denote by f(σ, e) the probability of choosing e. In order to use Lemma 27, we need to considerthe ratios f(σj , e)/f(σj+1, e). For any jump transition ej we have f(σj, ej)/f(σj+1, ej) = 1. For adirected transition ed, f(σj, ed)/f(σj+1, ed) = ((1− ǫ)(1− σj))/((1− ǫ)(1− σj+1)) = (1− σj)/(1−

26

Page 28: Walking Randomly, Massively, and Efficiently

σj+1). Finally, for an undirected transition eu we have f(σj, eu)/f(σj+1, eu) = σj/σj+1. Note thatout of the three ratios, only the second one is smaller than 1 and hence is the smallest one. Hence,for any transition e we have

f(σj, e)

f(σj+1, e)≥ 1− σj

1− σj+1=

1− σj1− σj + δ

= 1− δ

1− σj + δ≥ 1− δ

12 + δ

≥ 1

1 + 2δ.

In order to use Lemma 27 we are going to set r = 1/(1 + 2δ). The definition of r together withthe sampling ratios define the probability that the algorithm of Lemma 27 does not fail. Altogether,if the input walk consists of l edges, has d directed transitions and u undirected transitions, thenthe probability that none of the l applications of Lemma 27 fails is

rl(1− σj+1)

dσuj+1

(1− σj)dσuj

.

Observe that this is exactly what TranslateWalk-σ does, which implies that the procedure iscorrect in the case when σj < 1/2. Finally, to bound the success probability note that 1/(1+2δ)l ≥(1− 2cδ)l.

Case σj ≥ 1/2. As in the previous case, we prove the claim separately for each edge of the walk.Differently from the previous case, though, in this case the translation procedure may actuallychange the transition types along the walk.

Let w be a random walk in Gǫ,σj . Consider an edge e belonging to the walk w, which the walktraverses after reaching vertex v. Observe that Line 15 translates e to an undirected transitiononly if e is an undirected transition and a coin toss with probability 1 − ρ(v) succeeds. Thus, thishappens with probability pU(v) = (1− ǫ)σj(1− ρ(v)).

On the other hand, e is translated to a directed transition either when e is a directed transition,or when it is an undirected transition, and coin toss with probability 1− ρ(v) fails, and there existsa corresponding directed transition. The latter happens with probability deg+G(v)/degG(v), since inthis case e is a randomly chosen undirected transition incident to v. Hence, the overall probability

of translating e to a directed transition is pD(v) = (1− ǫ)(

1− σj +deg+G(v)

degG(v)ρ(v)σj

)

.

Since the output of Line 15 should be a random walk in Gǫ,σj+1 , we must have

pU (v)

pD(v)=

(1− ǫ)(σj − δ)

(1− ǫ)(1 − σj + δ). (13)

This allows us to derive ρ(v), let r(v) = degG(v)

deg+G(v)and let y(v) = ρ(v)/r(v) =

deg+G(v)

degG(v)ρ(v). Then,

from (13) we have

σj(1− ρ(v)) (1− σj + δ) = (1− σj + y(v)σj) (σj − δ)

=⇒ y(v)σj (σj − δ + r(v)− r(v)σj + r(v)δ) = δ

=⇒ y(v)σj =δ

σj − δ + r(v)− r(v)σj + r(v)δ, (14)

which in particular gives us a formula for ρ(v). In the following we verify ρ(v) ∈ [0, 1]. To that end,we upper-bound y(v). Given that σj ≥ 1/2, (14) implies

y(v) ≤ 2δ

σj − δ + r(v)− r(v)σj + r(v)δ. (15)

27

Page 29: Walking Randomly, Massively, and Efficiently

The values r(v) and δ are fixed for a given vertex v. We upper-bound y(v) from (15) by minimizingthe denominator of (15). Since r(v) appears in the form r(v)(1−σj)+r(v)δ > 0 in the denominator,and r(v) ∈ [1, c], the denominator is minimized for r(v) = 1. For r(v) = 1, the denominator of (15)becomes 1. This implies that y(v) and ρ(v) are nonnegative. Moreover,

y(v) ≤ 2δ and ρ(v) ≤ 2r(v)δ ≤ 2cδ ≤ 1. (16)

Now we can upper bound the probability that translation of an edge sampled from Gǫ,σj toan edge sampled from Gǫ,σj+1 “fails”, conditioned on e not being a random jump. Denote thatprobability by β(v). Then, we have

β(v) = 1− pU (v)− pD(v) = 1− σj(1− ρ(v)) −(

1− σj +deg+G(v)

degG(v)ρ(v)σj

)

= σjρ(v)

(

1− deg+G(v)

degG(v)

)

≤ ρ(v)(16)

≤ 2cδ. (17)

We now comment on random jumps. From our analysis, a transition which is not a randomjump is rejected (i.e., “fails”) with probability β(v). To account for that, if a transition of w is arandom jump, TranslateWalk-σ will also “fail” with probability β(v), and with the remainingprobability keep this random jump. In this way, conditioned that TranslateWalk-σ does not“fail”, an edge eT of the output of TranslateWalk-σ(w) is a random jump with probability ǫ, adirected edge with probability (1−ǫ)(1−σj+1), and an undirected edge with probability (1−ǫ)σj+1.

Hence, TranslateWalk-σ outputs “fail” per edge with probability ǫβ(v) + (1 − ǫ)β(v) =β(v) ≤ 2cδ, and hence an invocation of this method does not output “fail” with probability at least(1− 2cδ)l.

5.1.3 Proof of Theorem 23

We now prove Theorem 23 by showing that Algorithm 5 has the properties given in the theoremstatement. To show correctness, observe that an iteration of the loop of Algorithm 5 simulatesalgorithm Algorithm 4 for computing πǫ,σj+1 .

We split the rest of the proof into three parts: the space requirements; the round complexity;and, the probability of success. Throughout the proof, we set parameters as l =

9 lnnǫ

and

K =⌈

9 lnnǫ·α2

≤ 10 lnnǫ·α2 . The proof of the success probability fixes the value of C.

Round complexity. Algorithm 5 executes O(δ−1) iterations. Each iteration implicitly constructsa stochastic graph on Line 5, which by Observation 24 can be done in O(1) rounds. Also in eachiteration is invoked Algorithm 1, which takes O(log l) = O(log log n + log 1/ǫ) rounds. Since weassume that each walk is stored entirely on a machine, TranslateWalk-σ on Line 9 can beimplemented without extra communication. To obtain WT defined by Line 10, each walk that“fails” is marked by flag fail, and otherwise marked by flag succeed. Those walks marked bysucceed define WT .

By Lemma 22, Line 11 can be implemented in O(1) rounds. Hence, the total round complexityis O

(

δ−1(log log n+ log 1/ǫ))

.

28

Page 30: Walking Randomly, Massively, and Efficiently

Success probability. By Lemma 21, the probability that any πǫ,σj is not (1 +α)-approximationof πǫ,σj is at most 3

n2 .The next place when Algorithm 5 can fail is Line 11, i.e., we need to have at least K walks

staring in each vertex v. Note that algorithm generates

⌈Cπǫ,σj(v)n ln nk⌈log l⌉⌉ ≥ Cπǫ,σj(v)n ln n

≥ Cπǫ,σj(v)(1 − α)n ln n ≥ Cǫ(1− α) ln n.

walks from each vertex. These walks are subsampled with probability at least (1−2cδ)l by Lemma 26.We will aim to have 2K walks in expectations, so that by Chernoff bound (Theorem 7 (C)) theprobability of this number being smaller than K is

exp(

−δ2µ/3)

= exp (−K/3) ≤ exp (−3 lnn) = 1

n3.

Using union bound over all vertices we will have not enough walks with probability at most 1n2 .

Hence, we need to set C so that

2K ≤ (1− 2cδ)l · Cǫ(1− α) ln n.

This gives

C ≥ 20

ǫ2α2(1− α) · (1− 2cδ)l≥ 80

3ǫ2α2(1− 2cδ)l.

Hence, this inequality is satisfied by setting C = 28α2ǫ2·(1−2cδ)l

. Now, by Lemma 15 (i) the sampling

algorithm fails with probability at most

n−Cαǫ3

+2e2 ≤ n− 283αǫ

+2e2 ≤ n−4+2e2.

The probability of any of these failures happening in each round is at most 3n2 + 1

n2 + e2

n2 < 12n2 .

Hence, over all round the failure probability is O( 1n2δ ).

Space requirement. By Lemma 15 (iii) and from 1/n ≤ α ≤ 1/4, the space required is

O(

m+ Cl1+2αn lnn)

= O

(

m+1

α2ǫ2 · (1− 2cδ)ll1+2αn lnn

)

= O

(

m+1

α2ǫ2· (1 + 2cδ)ll1+2αn lnn

)

= O

(

m+1

α2ǫ2· (1 + 2cδ)ll1.5n lnn

)

= O

(

m+1

α2ǫ2· (1 + 2cδ)9

ln nǫ

(

9lnn

ǫ

)1.5

n lnn

)

= O

(

m+1

α2ǫ3.5· (1 + 2cδ)9

ln nǫ n ln2.5 n

)

= O

(

m+1

α2ǫ3.5· n18 cδ

ǫ n ln2.5 n

)

.

Moreover, we need O(

m log2 n log lognǫ2α2

)

space in Line 2 of Algorithm 5. This completes the proof of

Theorem 23.

29

Page 31: Walking Randomly, Massively, and Efficiently

5.2 Transformation to a c-balanced Graph

In this section we will describe how to reduce a general graph G = (V,E) without dangling verticesto a 3-balanced multigraph Gc = (Vc, EC). The ways to handle dangling vertices are discussed inAppendix B. The idea is to replace each vertex by a path of length λ = ⌈log n⌉.

The graph Gc = (Vc, Ec) is defined as follows

Vc = vi : v ∈ V, i ∈ [1, . . . , λ],

Ec = uλv1 : uv ∈ V ∪ (uiui+1)j : i ∈ [1, . . . , λ− 1], j ∈ [0, . . . , ⌈deg−(u)/2i⌉].

Lemma 28. If G does not contain dangling vertices then Gc is a 3-balanced graph. Moreover, Gc

contains n⌈log n⌉ vertices and at most 2m+ n⌈log n⌉ edges.

Proof. In order to prove that Gc is c-balanced, consider three cases for vi ∈ Vc:

i = 1 then vi has deg−(v) inedges, and ⌈deg−(v)/2⌉ outedges, so

3 deg+(vi) = 3⌈deg−(v)/2⌉ ≥ ⌈deg−(v)/2⌉ + deg−(v) = deg(vi).

1 < i < λ then vi has ⌈deg−(v)/2i−1⌉ inedges, and ⌈deg−(v)/2i⌉ outedges, so

3 deg+(vi) = 3⌈deg−(v)/2i⌉ ≥ ⌈deg−(v)/2i⌉+ ⌈deg−(v)/2i−1⌉ = deg(vi).

i = λ then vi has ⌈deg−(v)/2λ−1⌉ inedges and at least 1 outedge, so

3 deg+(vi) ≥ deg+(vi)+2 ≥ deg+(vi)+⌈2 deg−(v)/n⌉ ≥ deg+(vi)+⌈deg−(v)/2λ−1⌉ = deg(vi).

By construction the number of vertices is n⌈log n⌉, whereas the number of edges added to Gc canbe accounted to inedges in G, i.e., for a vertex v with indegree deg−(v) we are adding

λ−1∑

i=1

⌈deg−(v)/2i⌉ ≤λ−1∑

i=1

deg−(v)/2i + 1 ≤ deg−(v) + λ− 1,

edges. This gives m+ nλ = m+ n⌈log n⌉ additional edges.

For each uv ∈ E we call the edge uλv1 ∈ EC core. From the construction of Gc we easily getthe following.

Observation 29. Let W = e1, e2, . . . , ek be a walk in GC . Then, there exists 1 ≤ i ≤ λ that has thefollowing property. Let WR be a subsequence of W consisting of edges ei+jλ for 0 ≤ j ≤ (k − i)/λ.Then, WR is a walk in G, which contains all core edges of W .

30

Page 32: Walking Randomly, Massively, and Efficiently

5.3 Increasing Damping Factor

In Section 5.2 we described how to transform the input graph G to a c-balanced graph Gc. Inthis process, each edge of G is replaced by a path of length ⌈log n⌉. That means that a randomwalk of length l in G corresponds to a random walk of length l · ⌈log n⌉ in Gc. In order make acorrespondence between PageRank walks in G to those in Gc, we need PageRank walks in Gc tomake jump transitions roughly log n times less frequently than in G. (We make this statementprecise in Section 5.4.) In light of this, we design method PageRankLargerDampingFactor

(Algorithm 8) that given an approximate ǫ-PageRank of G outputs an approximate ǫ′-PageRank ofG for ǫ′ < ǫ. Moreover, for a given parameter τ , it does so in O

(

log1+τǫǫ′

)

iterations each of whichis implemented by invoking RandomWalks for length O(log n/ǫ′). The parameter τ also affectsspace complexity (for details see Theorem 33), and in the final setup we let τ = o(1).

PageRankLargerDampingFactor uses TranslateWalk-ǫ (Algorithm 7) as a subroutine.Given a random walk w in Gǫj ,0 and ǫj+1 ≤ ǫj , TranslateWalk-ǫ either returns a walk wT whichis a random walk in Gǫj+1,0 or “fails”. This method is very similar to TranslateWalk-σ for thecase σj < 1/2.

Algorithm 7

1: function TranslateWalk-ǫ(G, ǫj , ǫj+1, w)2: Let g be the number of directed transitions in w.3: Let t be the number of jump transitions in w.

4: p← rǫtj+1(1−ǫj+1)g

ǫtj(1−ǫj)g⊲ r is set in Lemma 30 to guarantee that p ≤ 1.

5: With probability p return w; otherwise, return “fail”.

Algorithm 8 An algorithm that given a (1 + α)-approximate ǫ1-PageRank πǫ1 of G for ǫ1 ≤ 1/2,outputs a (1 + α)-approximate ǫ′-PageRank πǫ′ of G for ǫ′ < ǫ1. Parameter τ ∈ (0, 1/2) affectssuccess probability, round complexity and space complexity.

1: function PageRankLargerDampingFactor(G, πǫ1 , ǫ′, τ)

2: for j ← 1 . . . ⌈log1/(1−τ) ǫ1/ǫ′⌉ do

3: l← ⌈9 lnnǫj⌉

4: Implicitly compute Gǫj ,0 using Eq. (12).5: Run RandomWalks

(

Gǫj ,0, l, t)

where ti(v) = ⌈Rπǫj (v)n lnn · ki⌉, ki is defined by (5).Let W be the set of resulting walks. ⊲ R is set in Theorem 33

6: WT ← ∅7: ǫj+1 ← max ǫ′, ǫj(1− τ)8: for all w ∈W in parallel do9: if wT = TranslateWalk-ǫ(G, ǫj , ǫj+1, w) did not “fail” then

10: Add wT to WT .

11: πǫj+1 = StationaryDistribution(WT , ǫj)

Lemma 30. Given a random walk w of length l in Gǫj ,0, with probability at least(

1−ǫj1−ǫj+1

)lthe call

TranslateWalk-ǫ(w) outputs a random walk of length l in Gǫj+1,0; otherwise, TranslateWalk-ǫ(w)reports “fail”.

31

Page 33: Walking Randomly, Massively, and Efficiently

Proof. The proof of this lemma is similar to one of Lemma 26 for σj < 1/2. In this proof as well,we would like to use Lemma 27 for each edge of walk w. Each edge of w is either a jump transition(with probability ǫj) or a (directed) graph transition (with probability 1− ǫj).

As in Lemma 26, fix a vertex v and let e be some transition that can be chosen by a random walkthat has reached v. Denote by f(ǫ, e) the probability of choosing e. To use Lemma 27, consider theratios f(ǫj, e)/f(ǫj+1, e). For any jump transition ej we have f(ǫj, ej)/f(ǫj+1, ej) =

ǫjǫj+1≥ 1. For a

graph transition eg, we have f(ǫj, eg)/f(ǫj+1, eg) =1−ǫj

1−ǫj+1< 1. Hence, for any transition e we have

f(ǫj, e)

f(ǫj+1, e)≥ 1− ǫj

1− ǫj+1.

So, we set r = (1− ǫj)/(1− ǫj+1) and the proof follows by application of Lemma 27 in the sameway as in the proof of Lemma 26 for σj < 1/2.

We now want to use Lemma 30 to lower-bound the success probability of TranslateWalk-ǫinvoked within Algorithm 8. For that, we first establish the following inequality.

Lemma 31. For any y ∈ [0, 1/2] we have 1− y ≥ exp(−2y).

Proof. From Taylor expansion we have 1 − x + x2/2 ≥ exp(−x). Since for x ∈ [0, 1] it holds thatx/2 ≥ x2/2, we have 1 − x/2 ≥ 1 − x + x2/2 ≥ exp(−x). Now the lemma follows by lettingy = x/2.

Lemma 32. Let τ ≤ 1/2. If ǫj+1 = (1 − τ)ǫj , then TranslateWalk-ǫ invoked by Algorithm 8succeeds with probability at least n−36τ .

Proof. By Lemma 30, TranslateWalk-ǫ “fails” with probability at most ((1− ǫj)/(1 − ǫj+1))l for

l = c lnnǫj

. We now upper-bound this probability. We have

1− ǫj1− ǫj+1

= 1− τǫj1− (1− τ)ǫj

.

Therefore,

(

1− ǫj1− ǫj+1

)l

=

(

1− τǫj1− (1− τ)ǫj

) 9 lnnǫj Lemma 31

≥ n− 18τ

1−(1−τ)ǫj ≥ n−36τ ,

where we used the fact that ǫj, τ ≤ 1/2.

Theorem 33. Let G be a directed graph. Let ǫ1 > ǫ′ ≥ log n/o(S), and let πǫ1 be a (1 +α)-approximate ǫ1-PageRank of G. Given τ ∈ (0, 1/2], PageRankLargerDampingFactor

(Algorithm 8) outputs a (1+α)-approximate ǫ′-PageRank πǫ′ of G. Moreover, PageRankLarger-

DampingFactor can be implemented in O(τ−1 · log 1/ǫ′ ·(log log n+log 1/ǫ′)) MPC rounds and thetotal space of O

(

m+ 1ǫ′3.5α2n

36τn ln2.5 n)

with strongly sublinear space per machine. This algorithmis randomized and outputs a correct result with probability at least 1−O( 1

τn2 · log 1/ǫ′).

Proof. We split the proof into three parts: upper-bounding the success probability, deriving thespace requirement, and analyzing the round complexity.

32

Page 34: Walking Randomly, Massively, and Efficiently

Round complexity. The main loop of Algorithm 8 is executed at most⌈

log1/(1−τ) 1/ǫ′⌉

=

O(

τ−1 · log 1/ǫ′)

times. Each loop invokes RandomWalks for length O(

lnnǫ′

)

, which can be exe-cuted in O(log log n+ log 1/ǫ′) MPC rounds. Line 9 to Line 11 can be implemented in O(1) roundsin the same way as described in the proof of Theorem 23 (see Section 5.1.3).

Success probability. By Lemma 21 each πǫj computed on Line 11 is a (1+α)-approximation ofπǫ′ with probability at least 1− 3/n2.

Note that algorithm generates at least

⌈Rπǫj(v)n ln nk⌈log l⌉⌉ ≥ Rπǫj(v)n ln n

≥ Rπǫj(v)(1 − α)n ln n ≥ Rǫj(1− α) ln n,

random walks from each vertex. Next, these walks are downsampled using Lemma 32 with proba-bility n−36τ . Similarly as in Theorem 23, we want the expected number of walks to be 2Kj , so thatthe failure probability is less than 1

n2 . Hence, the following requirement on R

2Kj ≤ n−36τ · Rǫj(1− α) ln n

what gives

R ≥ 20

n−36τ ǫ2j (1− α)α2≤ 28

ǫ2jα2n36τ

We set R = 28ǫ2jα

2n36τ and, similarly, as in Theorem 23 the sampling algorithm fails with probability

at most n−2e2. This gives the failure probability 12n2 of each round, and over all rounds we get

12τn2 · log 1/ǫ′.

Space requirement. The space usage is dominated by calls to Algorithm 1, which from Lemma 15are bounded by O

(

m+Rl1+2αn lnn)

. Observe that l is the highest in the last round and equals

l = ⌈9 lnnǫ′ ⌉ = O(lnn/ǫ′). This gives the following upper-bound on space

O(

m+Rl1+2αn lnn)

= O

(

m+1

ǫ′2α2n36τ l1+2αn lnn

)

= O

(

m+1

ǫ′2α2n36τ l1.5n lnn

)

= O

(

m+1

ǫ′2α2n36τ

(

9 lnn

ǫ′

)1.5

n lnn

)

= O

(

m+1

ǫ′2α2n36τ

(

9 lnn

ǫ′

)1.5

n lnn

)

= O

(

m+1

ǫ′3.5α2n36τn ln2.5 n

)

.

33

Page 35: Walking Randomly, Massively, and Efficiently

5.4 From PageRank in c-balanced to PageRank in General Graphs

In the previous sections we developed a way for efficiently approximating PageRank of c-balancedgraphs. Here, we describe how to use that result to approximate the PageRank of a graph G,which is not necessarily c-balanced. Let Gc be a c-balanced graph obtained from G by applyingthe transformation described in Section 5.2. Recall that to approximate PageRank it suffices tosample random walks up to the point of their first random jump (see Algorithm 5). For the input

graph G and damping factor 1− ǫ such a random walk with high probability has length O(

lognǫ

)

.

Instead of generating PageRank walks in G, we will generate them in Gc, but for a dampingfactor 1 − ǫ/poly log n (see Lemma 34). This will be done using Algorithm 8. Let Wc be an( ǫpoly logn)-PageRank walk. We observe that with constant probability the first jump in Wc appears

after Ω(

log2 nǫ

)

steps. Assuming this from Wc we can obtain a random walk W in G of length

Ω(

lognǫ

)

such that no transition of W is a random jump. In order to obtain ǫ-PageRank we

reintroduce random jumps with probability ǫ and truncate walks after first such jump. This is doneby sequentially iterating over the edges of W and with probability ǫ truncating W at any givenstep. This process is given as algorithm TranslateToPageRankWalk (Algorithm 9).

Algorithm 9 Let G be a graph and Gc its c-balanced version. Given a PageRank walk w in Gc

that has no random jumps, the algorithm returns an ǫ-PageRank walk in G that has exactly onerandom jump transition or the algorithm “fails”. This random jump transition is the last one in thewalk.1: function TranslateToPageRankWalk(G, ǫ, w)2: if w contains random jump then “fail”.

3: Let wR be the walk in G consisting of all core edges of w. ⊲ See Observation 294: Mark each edge of wR independently and with probability ǫ.5: if at least one edge of wR is marked then6: Truncate wR before the first marked edge.7: Return wR.8: else9: “fail”

Now we present the main algorithm of this entire section. The algorithm outputs an approximatePageRank for the input graph.

34

Page 36: Walking Randomly, Massively, and Efficiently

Algorithm 10 An algorithm for computing a (1 + α)-approximate PageRank πǫ of a graph G.

1: function PageRankOfGeneralGraphs(G, ǫ, α)2: Let Gc be the balanced graph of G obtained as described in Section 5.2.3: Let πc

1/2 ← PageRankOfBalancedGraphs(Gc, 1/2, α).

4: ℓ← ⌈log n⌉ ·⌈

9 lognǫ

, and ǫ′ ← 1/(4ℓ).

5: Let πcǫ′ ← PageRankLargerDampingFactor(Gc, π

c1/2, ǫ

′).6: Implicitly compute (Gc)ǫ′,0 using Eq. (12).7: Run RandomWalks((Gc)ǫ′,0, ℓ, t) where ti(v) = ⌈Lπc

ǫ′(v)n ln n ·ki⌉, ki is defined by (5) andd′ is a sufficiently large constant. Let W be the set of resulting walks. ⊲ L is set in Theorem 5

8: WT ← ∅9: for all w ∈W in parallel do

10: if wT = TranslateToPageRankWalk(G, ǫ,w) did not “fail” then11: Add wT to WT .

12: πǫ = StationaryDistribution(WT , ǫ)

Note that Algorithm 10 does not directly map PageRank walks from Gc to PageRank walks inG. Instead, it takes advantage of the fact that in order to approximate PageRank one only needsto know the random walks until the first jump transition and proceeds as follows. It first computesPageRank walks in Gc, then discards all walks that have at least one jump transition and finallytruncates the resulting walks by simple coin tossing. Note that this truncation step is equivalent totruncating the walks just before the first jump transition.

Now we analyze the correctness of Algorithm 10. The following result will be useful in estab-lishing failure probability of TranslateToPageRankWalk.

Lemma 34. Let ℓ ≥ 1 be a parameter. Define ǫ′ = 1/(4ℓ). An ǫ′-PageRank walk does not make arandom jump within the first ℓ steps with probability at least 0.6.

Proof. An ǫ′-PageRank walk does not make a random jump within the first ℓ steps with probability

(1− ǫ′)lLemma 31≥ exp

(

−2ǫ′l)

= exp (−1/2) ≥ 0.6.

Lemma 34 essentially states the following. If we are given an ǫ′-PageRank walks in Gc then withprobability at 1/2 it can be turned into a ǫ-PageRank walk in G.

Lemma 35. The invocation TranslateToPageRankWalk(G, ǫ,w) on Line 10 of Algorithm 10“fails” with probability at most 1/2. If the algorithm succeeds, then it returns an ǫ-PageRank walkof G that has exactly one jump transition and that jump transition is the last one in the walk.

Proof. We first analyze the success probability of TranslateToPageRankWalk and then showthe claim for its output.

Success probability. There are two lines where TranslateToPageRankWalk can “fail” –Lines 2 and 9. By Lemma 34, it “fails” in Line 2 with probability at most 1/2.Line 9 is executed only if no edge of wR is marked on Line 4. We have

|wR| = ⌊ℓ/ ⌈log n⌉⌋ =⌊

⌈log n⌉ ·⌈

9 lognǫ

/ ⌈log n⌉⌋

=⌈

9 lognǫ

.

35

Page 37: Walking Randomly, Massively, and Efficiently

Hence, by Lemma 20 no edge of wR is marked with probability at most 1n9 ≤ 1/512. This implies

that the invocation of TranslateToPageRankWalk succeeds with probability at least 0.6 −1/512 ≥ 1/2, as desired.

Output of TranslateToPageRankWalk. As input, TranslateToPageRankWalk getsa PageRank walk w in Gc; this walk is generated on Line 7 of Algorithm 10. On Line 2 ofTranslateToPageRankWalk, w is discarded if it contains any jump transition. So, if thealgorithm does not “fail”, w is a random walk in Gc. By the transformation on Line 3 we obtain awalk wR which is a random walk in G. After that, each transition of wR is marked with probabilityǫ (that is implemented by marking on Line 4), i.e., we toss a random coin to see potential stepsfor random jumps. Then we truncate the walk at the moment of the first jump in wR occurs, andreturn this truncated walk. By construction, this walk is an ǫ-PageRank walk in G.

Now we are ready to prove the correctness of Algorithm 10, which establishes one of the mainresults of the paper.

Theorem 5. Let G be a directed graph. Let α ∈ [1/n, 1/4] and ǫ ∈ [log3 n/o(S), 1], where S is theavailable space per machine. There exists an MPC algorithm that, with probability at least 1−O( 1n ),

computes a (1 + α)-approximate ǫ-PageRank vector of G in O(

log2 log n+ log2 1/ǫ)

rounds, using

O(

mα2 + n1+o(1)

ǫ3.5α2

)

total space and strongly sublinear space per machine.

Proof. We will show that Algorithm 10 satisfies properties of this claim. As in previous proofs wesplit the proof into three parts: the space requirements; the round complexity; and, the probabilityof success. This is the moment when we set all the parameters of the algorithms. We set δ to be thelargest value not greater than (log log n)−1 such that δ−1 ∈ N. Observe that δ ≥ 1/(1 + log log n).Also, we set τ = log log log−1 n. We will use the following bounds

O(1/ǫ′) = O(ℓ) = O(log2 n/ǫ).

O(log 1/ǫ′) = O(log ℓ) = O(log log n+ log 1/ǫ),

and when cruder bound is enough we bound

O(log 1/ǫ′) = O(log log n+ log n) = O(log n).

Round complexity. By Theorem 23 the execution of PageRankOfBalancedGraphs(Gc, 1/2, α, δ)takes O(δ−1 log log n) = O(log2 log n) rounds.

Recall that ǫ′ = Θ(

ǫlog2 n

)

in Algorithm 10 (see Line 4). By Theorem 33 the execution of

PageRankLargerDampingFactor on Line 5 of Algorithm 10 takes

O(τ−1 · log 1/ǫ′ · (log log n+ log 1/ǫ′)) = O(log log log n · (log log n+ log 1/ǫ) · (log log n+ log 1/ǫ))

= O(log2 log n+ log2 1/ǫ)

rounds. Note that since ǫ ≥ log3 n/o(S), we have ǫ′ ≥ log n/o(S), which is required by Theorem 33.Line 10 to Line 12 can be implemented in O(1) rounds in the same way as described in the proof

of Theorem 23 (see Section 5.1.3).

36

Page 38: Walking Randomly, Massively, and Efficiently

Success probability. As in proof of Theorem 23 we require that there are 2K walks starting ineach vertex before the call to StationaryDistribution, so that we have K walks in each vertexwith probability at least 1− 1

n2 . Using the downsampling from Lemma 35 this gives

2K ≤ 1

2⌈Lπ(v)n ln n⌉ ≤ Lǫ′(1− α) ln n,

what bounds L as

L ≥ 40

ǫ2α2(1− α)≥ 160

3ǫ2α2.

We set L = 54α2ǫ2 . Now, by Lemma 15 (i) the sampling algorithm fails with probability at most

n−Cαǫ3

+2e2 ≤ n− 543αǫ

+2e2 ≤ n−2e2 = O

(

1

n

)

.

By Theorem 23 the execution of PageRankOfBalancedGraphs fails with probability O(

1n2δ

)

=O(

1n

)

.By Theorem 33 the execution of PageRankLargerDampingFactor fails with probability

O

(

1

τn2· log 1/ǫ′

)

= O

(

log log log n

n2(log log n+ log 1/ǫ)

)

(18)

= O

(

log log log n

n2(log log n+ log n)

)

(19)

= O

(

1

n

)

. (20)

Hence, in total the failure probability of the algorithm is O(1/n).

Space complexity. By Theorem 23 the execution of PageRankOfBalancedGraphs(Gc, 1/2, α, δ)requires space

O

(

m ln2 n ln lnn

(1/2)2α2+ n

18 3δ1/2

n ln2.5 n

(1/2)3.5 · α2

)

= O

(

m

α2+

n1+o(1)

α2

)

.

By Theorem 33 the execution of PageRankLargerDampingFactor needs space

O

(

m+1

ǫ′3.5α2n36τn ln2.5 n

)

= O

(

m+log7 n

ǫ3.5α2n36τn ln2.5 n

)

= O

(

m+n1+o(1)

ǫ3.5α2

)

.

By Lemma 15 the sampling algorithm RandomWalks requires space bounded by

O(

m+ Lℓ1+2αn lnn)

= O

(

m+1

ǫ2α2· ℓ1.5n

)

= O(

m+n

ǫ3.5α2

)

.

Hence, the final space complexity of the algorithm is O(

mα2 + n1+o(1)

ǫ3.5α2

)

.

We can now use Theorem 5 to sample directed random walks and obtain the following result.

37

Page 39: Walking Randomly, Massively, and Efficiently

Theorem 2. Let G be a directed graph. Let D and l be positive integers such that l = o(S)/ log3 n,where S is the available space per machine. There exists an MPC algorithm that samples Dindependent random walks of length l starting in v for each v in G. The algorithm runs inO(log2 log n+ log2 l) rounds and uses O

(

m+ n1+o(1)l3.5 +Dnl2+o(1))

total space and strongly sub-linear space per machine. The algorithm is an imperfect sampler (see Definition 8) that does notfail with probability 1−O(n−1).

Proof. Let α = 1/ log n and ǫ = 1/(4l). Invoke Theorem 5 to obtain a (1 + α)-approximate ǫ-PageRank. This invocation can be implemented in O

(

log2 log n+ log2 l)

rounds and the total

space of O(

m+ n1+o(1)l3.5)

.Let ti be as defined in Lemma 15. Invoke RandomWalks(G, l, t) with C = 20D/(αǫ). By

Lemma 15, and given that π(v) ≥ (1 − α)π(v) ≥ (1 − α)ǫ/n, with probability at least 1 − Θ(1/n)this invocation outputs at least Cπ(v)n lnn ≥ 20D lnn random walks from each vertex v. Let Wbe the collection of those walks. Also by Lemma 15, W can be obtained in O(log l) rounds by usingthe total space of O(m+ Cl1+2αn) ∈ O

(

m+Dnl2+o(1))

.The walks in W are PageRank walks. Nevertheless, by Lemma 34 and Chernoff bound, with

probability at least 1 − Θ(1/n) for each v there exist D PageRank walks in W that contain norandom jump. Those walks are the walks that satisfy the claim of this theorem.

6 PRAM Implementation

In this section we discuss PRAM implementation of our algorithms for computing (directed) randomwalks and hence prove the following.

Theorem 6. Let G be a directed graph and 1 ≤ l ≤ n. There exists an NC algorithm that usesO((n+m)1+o(1)) processors and samples one random walk from each vertex of G. All sampled walksare independent. The algorithm is an imperfect sampler (see Definition 8) that fails with probability1−O(n−1).

6.1 Storing and Sorting Walks

In our MPC implementation of the algorithms described in other sections, we assume that an entirerandom walk can be stored on one machine. In fact, this is the only reason why our algorithm assumethat the length of each walk l satisfies l = o(S). In this section we show a different approach, whichwould also allow us to increase the upper bound on l.

To store each walk w, we allocate l consecutive memory cells, even if |w| < l. The edges of woccupy the first |w| cells and the remaining cells are filled with ⊥.

Walk identifiers. To each created walk w we assign an integer identifier wid chosen uniformly atrandom from the interval [1, n10]. With high probability walks have distinct identifiers.

Sorting walks. We now explain how to sort the walks with respect to their first vertex. Considera walk w with the starting vertex v. Then, to i-th cell allocated for w we assign triple (v,wid, i).This labeling can be done in O(log n) time by building a binary tree of processors over the cellsallocated for w. A nice feature of this labeling, and also assigning identifiers to random walks,is that after sorting the triples lexicographically, all the cells containing information of w appearconsecutively. Sorting walks with respect to their last vertex is done in an analogous way.

38

Page 40: Walking Randomly, Massively, and Efficiently

Stitching two walks. When the algorithm has to stitch two walks w1 and w2, it copies the edgesof w2 over the first |w2| cells allocated for w1 that have value ⊥. The first cell containing value ⊥can be found in O(log n) time, e.g., by performing a binary search. After such cell x is found, wecopy w2 (that occupies consecutive memory cells) over memory cells x through x+ |w2| − 1. Thisagain can be done in O(log n) time by building a binary tree over w2.

6.2 Implementation of RandomWalks

From our discussion about MPC implementation of RandomWalks, it suffices to show that sorting,computing prefix sums, NumberingSublists and Predecessor can be implemented in O(log n)time. Sorting and computation of prefix sums can be done in O(log n) time in [Col88] and [LF80],respectively.

We now describe how to implement Predecessor. Recall that in this primitive we are givenan ordered list of tuples L, each element labeled by 0 or 1. This primitive can be implemented asfollows.

• Assign i to the i-th element of L. This can be done by computing the prefix sum on L witheach element having value 1.

• Consider the i-th element ei of L. If the label of ei equals 0, assign value/pair (0, ei) to ei.Otherwise, assign value (i, ei) to ei.

• The prefix sum approach described in [LF80] can be performed for any associative operation,including max. Compute the prefix sum with operation max over the values/pairs assignedto the elements of L. These prefix sums correspond to the output of Predecessor.

As described in Section 4.1, NumberingSublists can be implemented by using prefix sumcomputation, sorting and Predecessor, each of which can be executed in O(log n) time.

6.3 Implementation of StationaryDistribution

In Lemma 22 we described how to implement StationaryDistribution in MPC. That proof reliedon primitives NumberingSublists and Predecessor, that we explained how to implement inO(log n) time. The proof also computes maximum over certain subarrays, which can be done inO(log n) time.

To conclude the discussion, it remains to comment about implementation of Line 4. This linetruncates walks after their first random jump. This again can be done by building a binary treeover each walk w and finding the first edge corresponding to a random jump of w. All the cellsallocated to w after that jump are set to ⊥.

6.4 Construction of Stochastic Graphs

To construct a stochastic graph with parameters ǫ and σ in MPC, we had to broadcast ǫ and σ to allthe machines containing the input graph, copy edge and vertices and properly annotate them. Seethe description above Observation 24 for more details. In PRAM we perform similar steps. Namely,we build a binary tree over the memory cells that our algorithm uses, copy the corresponding edgesand vertices and annotate them by ǫ and σ.

39

Page 41: Walking Randomly, Massively, and Efficiently

6.5 Implementation of TranslateWalk

We use several algorithms that translate random walks between different graphs (Algorithms 6, 7and 9). Their MPC implementation is straightforward as we assume that each walk is stored on onemachine. Nevertheless, their PRAM implementation is also almost direct. Each of those methodscounts the number of distinct transition a walk has, or marks edges independently of each other, orfinds the first edge of a walk that has a specific property (e.g., Line 6 of Algorithm 9). Each of thoseoperations can be performed in O(log n) rounds by building a binary tree over the correspondingwalk.

7 Testing Bipartitness

We now show how to use our random walk algorithm for testing bipartiteness. In this promiseproblem, we are given an undirected graph G on n vertices with m edges and a parameter ǫ ∈ (0, 1).We want to distinguish the case that G is bipartite from the case that at least ǫm edges have to beremoved to achieve this property. Our parallel algorithm combines techniques developed for previousbipartiteness algorithms [GR99, KKR04] with our simulation of random walks. For simplicity, weassume that vertices of G are not isolated. Our algorithm can be seen as the following procedureconsisting of three steps, in which the first two are preprocessing steps:

1. If the graph is dense, we reduce its number of edges to O(n/ǫ) by independently keeping eachedge with an appropriate probability. The resulting graph is very likely to be still ǫ/2-far frombipartiteness.

2. If the graph has high degree vertices, we apply the idea of Kaufman, Krivelevich, and Ron [KKR04]to replace all high degree vertices with low degree bipartite expanders. This again preservesthe distance from bipartiteness up to a constant factor and allows to assume that the resultinggraph has only vertices of small degree.

3. In the resulting graph, we run a small number of short random walks from every vertex. Weshow that if the graph is far from bipartiteness then random walks from one of the verticesare very likely to discover an odd-length cycle.

We present a more formal description of the algorithm as Algorithm 11.

Algorithm 11 BipartitenessTester (G, ǫ): An algorithm for testing bipartiteness of an undi-rected graph G = (V,E) on n vertices for a closeness parameter ǫ ∈ (0, 1).

1: Independently keep each edge with probability min

1, O(

nǫm

)

.2: Replace high degree vertices with bipartite expanders (more details in Section 7.2).3: Using Algorithm 1, generate poly(ǫ−1 log n) random walks of length poly(ǫ−1 log n) from each

vertex.4: for v ∈ V do5: V0 ← vertices reached by the random walks from v in an even number of steps6: V1 ← vertices reached by the random walks from v in an odd number of steps7: if V0 ∩ V1 6= ∅ then8: return Reject

9: return Accept

40

Page 42: Walking Randomly, Massively, and Efficiently

7.1 Sampling a Sparse Graph

We now prove that the first step of our algorithm (with an appropriate constant selection) preservesthe distance to bipartiteness.

Lemma 36. Let G be an undirected graph on n vertices with m edges. Let a graph G′ be on thesame set of vertices as G created by selecting each edge independently with probability min

1, Cnǫm

,where C is a sufficiently large positive constant. If G is ǫ-far from bipartiteness, then with probability1− 2−Ω(n), G′ is ǫ/2-far from bipartite and has at most O(n/ǫ) edges.

Proof. Suppose that G = (V,E) is ǫ-far. The lemma holds trivially if Cnǫm ≥ 1, because G′ = G

is ǫ/2-far and has at most Cn/ǫ edges. We can therefore focus on the case that Cnǫm < 1. The

expected number of edges in this case is Cn/ǫ. By the Chernoff bound, the number of edges inG′ is, with probability 1 − 2−Ω(n), at most 11

10Cn/ǫ. Now consider any partition of the set V ofvertices of G into two sets V1 and V2. Since G is ǫ-far from bipartiteness, the sum of the numberof edges in G[V1] and G[V2] is at least ǫm. Otherwise, we could delete them to turn the graph intobipartite. The expected sum of the number of edges in G′[V1] and G′[V2] has then to be at leastǫm · Cn

ǫm = Cn. Again, by the Chernoff bound, the number of them is at least 910Cn with probability

1−2−Ω(n), where the constant hidden by the Ω-notation can be made arbitrarily large by making Csufficiently large. By the union bound, the probability that the total number of edges in G′ is morethan 11

10Cn/ǫ and that in one of the partitions, fewer than 910Cn edges can be removed to make the

graph bipartite is at most 2−Ω(n) +2n−1 · 2−Ω(n) = 2−Ω(n). This holds because all constants hiddenby the Ω notation can be made arbitrarily small by setting C to be sufficiently large. The distanceof G′ from bipartiteness is then at least

(

910Cn

)

/(

1110Cn/ǫ

)

≥ ǫ/2.

7.2 Replacing High-Degree Vertices with Expanders

We now give more details of Step 2 of Algorithm 11, which reuses the degree reduction method ofKaufman et al. [KKR04]. More specifically, Section 4.1 of their paper shows how to take a graphG = (V,E) and turn into a graph G′ = (V ′, E′) in which the maximum degree equals roughly theaverage degree of G. Additionally, G′ preserves G’s distance to bipartiteness.

We start by describing the transformation of the vertex set. Let d = d+avg(G). We copy everyvertex v of G such that deg+(v) ≤ d into G′. Vertices v of higher degree are replaced by bipartiteexpanders as follows. For each such v, we introduce two sets of vertices of cardinality ⌈deg+(v)/d⌉.We refer to one of them as internal, and the other one as external. If deg+(v) < d2, we create afull bipartite graph between the internal and external vertices with edge multiplicites that makevertices have degree almost d. Otherwise, when deg+(v) ≥ d2, we use any explicit bipartite expanderconstruction of degree d between the two sets [Mar73, LPS86].

Now we describe the transformation of the edge set. For every edge u1, u2 ∈ E, we adjust itsendpoints as follows. Let u be one of them. If u was directly copied from G to G′, then we donothing. Otherwise, we replace it with one of the vertices in the external set created for u. Forevery vertex u in the original graph that is replaced by a set of vertices, we assign the original edgesinvolving u to the external vertices created for u such that no external vertex is assigned more thand such edges.

Kaufman et al. [KKR04] prove the following.

Lemma 37 ([KKR04, Theorem 5]). The graph G′ created as above has the following properties:

41

Page 43: Walking Randomly, Massively, and Efficiently

(A) |V ′| = Θ(|E|) and |E′| = Θ(|E|).

(B) If G is bipartite, so is G′.

(C) If G is ǫ-far from bipartite for some ǫ ∈ (0, 1), then G′ is Ω(ǫ)-far from bipartite.4

(D) d+max(G′) = O(d+avg(G)).

7.3 Detecting Odd-Length Cycles in Low-Degree Graphs

Lemma 38. Let G be an undirected graph on n vertices with m edges such that d+max(G) =O(m/(ǫn)). There is an l = poly(ǫ−1 log n) such that two random walks of length l from a ran-dom vertex detect an odd-length cycle with probability at least Ω(n−1poly(ǫ/ log n)) if G is ǫ-farfrom bipartite, where ǫ ∈ (0, 1).

Proof. For G such that d+max(G) = O(m/(ǫn)), Kaufman et al. [KKR04, Theorem 1] (invoked withǫ2 in place of ǫ) show that the following algorithm can be used for one-sided testing of bipartiteness:

1. Sample k = Θ(1/ǫ2) vertices v and run t =√n · poly(ǫ−1 log n) independent random walks of

length l = poly(ǫ−1 log n) from each of them.

2. If for some v, two of the random walks reach the same vertex, one using an even number ofsteps and the other using an odd number of steps then reject. Otherwise accept.

Let p be the probability of two random walks of length l from a random vertex detecting anodd-length cycle. We use p to upper bound the probability of success of the tester of Kaufman etal. [KKR04] for a graph ǫ-far from bipartite. By the union bound, it cannot succeed with probabilitygreater than k ·

(t2

)

· p ≤ k · t2 · p. Since it has to succeed with probability at least 2/3, we havek · t2 · p ≥ 2

3 , and therefore, p = Ω(1/(k · t2)) = Ω(n−1 · poly(ǫ/ log n)).

7.4 Full Bipartiteness Tester

Theorem 39. Let α ∈ (0, 1) be a fixed constant. There is an MPC algorithm for testing bipartitenesswith a proximity parameter ǫ ∈ (0, 1) in a graph G with n vertices and m edges that with probabilityat least 1− 1/poly(n) has the following properties:

• The algorithm uses O(nα) space per machine, where α is an arbitrary fixed constant.

• The total space is O(m+ n · poly(ǫ−1 log n)).

• The number of rounds is O(log(ǫ−1 log n)).

Proof. We combine the knowledge developed in this section, and show that Algorithm 11 has thedesired properties.

Let G be an input graph on n vertices, and let G′ be the graph obtained by performing Line 1.By Lemma 36, G′ has O(n/ǫ) edges. Let G′′ = (V ′′, E′′) be obtained by performing Line 2. Then,by Lemma 37 we have |V ′′|, |E′′| = Θ(n/ǫ) (by Property A) and d+max(G

′′) = O(1/ǫ)(by Property Dand |E(G′)| = O(n/ǫ)). Moreover, by Lemma 36 and Property C of Lemma 37, if G is ǫ-far from

4Kaufman et al. in fact switch between two models and slightly different definitions of distance to bipartiteness in

this statement, but up to a constant factor, they are equivalent in our setting.

42

Page 44: Walking Randomly, Massively, and Efficiently

bipartite, then G′′ is Ω(ǫ)-far from bipartite. Also, from Property B of Lemma 37 and since G′ is asubgraph of G, we have that if G is bipartite, then G′′ is bipartite as well.

Next we apply Lemma 38 on G′′. Let n′′ = |V (G′′)| = Θ(n/ǫ). Lemma 38 states that totest whether G′′ is ǫ-far from bipartite (and consequently whether G is Θ(ǫ)-far from bipartite) itsuffices to perform the following: choose the multiset S of Θ(n′′ · poly(ǫ−1 log n′′)) random verticesof G′′ (chosen with repetition); for each random vertex take two random walks of length l =poly(ǫ−1 log n′′); if the endpoints of any pair of random walks collide, then Reject, and otherwiseAccept. Moreover, using that n′′ = Θ(n/ǫ), this test succeed with probability at least 1 − (1 −Ω(n−1poly(ǫ/ log n)))Θ(n·poly(logn/ǫ)) ≥ 1− poly(1/n) for appropriately chosen constants.

Now we show how to use our algorithms from Section 4 to generate the required random walksfrom G′′. Since vertices in S are chosen independently, by Chernoff bound we have that any vertex vappears O(poly(ǫ−1 log n′′)) times in S with probability 1− 1/poly(n). This implies that from eachvertex we need to generate Θ(poly(ǫ−1 log n′′)) pairs of random walks. For that, we use Algorithm 1with C = poly(ǫ−1 log n′′) to obtain the desired random walks in O(log l) = O(log (ǫ−1 log n)) MPCrounds and the total space of O(n·poly(ǫ−1 log n)) (see Lemma 11), where we used that n′′ = O(n/ǫ).This completes the analysis.

7.5 Additional Application: Finding Cycles in Graphs Far from Cycle-Freeness

We also note that our algorithm for bipartiteness testing can be used to find cycles in graphs thatare far from being cycle free. Czumaj, Goldreich, Ron, Seshadhri, Shapira, and Sohler [CGR+14]observe that the problem of finding such a cycle can be reduced to the problem of one sidedbipartiteness testing by replacing each edge of the graph independently with probability 1/2 witha path of length 2. If the initial graph is far from cycle-freeness, one can show that the modifiedgraph is far from bipartiteness. Our bipartiteness testing algorithm has one-sided error and can beused to find a pair of two short random walks from the same vertex that reveal an odd-length cyclein the modified graph. This cycle can then be mapped to a cycle in the initial graph, by contractingsome sub-paths of length 2 back to the corresponding original edge.

8 Testing Expansion

In this section we show how to test vertex-expansion of graphs. Our approach (see Algorithm 12)is inspired by the work of Czumaj and Sohler [CS10] and the work of Kale and Seshadhri [KS11].In [CS10, KS11], the algorithms simulate many, e.g, Θ(

√n), random walks from a small number

of randomly chosen vertices. If we applied our algorithms for sampling random walks directly, wecould bound the total space usage by O(m

√n), which is prohibitive. So, instead, we design an

approach in which we sample fewer random walks from each vertex (but much more random walksin total).

For X,Y ⊆ V , let N(X,Y ) denote the vertex-neighborhood of X within Y . That is, N(X,Y )def

=v ∈ Y : ∃u ∈ X such that u, v ∈ E.Definition 40. Let G be an undirected graph and α > 0. We say that G is an α-vertex-expanderif for every subset U ⊂ V of size at most |V |/2 we have |N(U, V \ U)| ≥ α|U |.Definition 41. Let G be a graph of maximum outdegree d and ǫ > 0. We say that G is ǫ-far froman α⋆-vertex-expander if one has to change (add/remove) more than ǫdn edges of G to obtain aα⋆-vertex-expander.

43

Page 45: Walking Randomly, Massively, and Efficiently

Algorithm 12 gets α as its input, and returns Accept if G is an α-vertex-expander, or returnsReject if G is far from being such an expander. The idea of the algorithm is as follows. From eachvertex, for O(poly log n) many times we run a pair of random walks. The length of these randomwalks is set in such a way that if G is an α-vertex-expander, then the endpoint of any of these walksis almost uniformly and randomly distributed over V . Hence, the endpoints of a pair of randomwalks from the same vertex are the same with probability very close to 1/n; if they are the same,we say that these two random walks resulted in a collision. If the number of collisions over all thevertices is significantly larger than 1, then we conclude that G is not an α⋆-vertex-expander, forsome α⋆ < α that we set later. Otherwise the algorithm accepts G.

Algorithm 12 ExpansionTester (G,α, ǫ): An algorithm that tests whether a given graph G ofmaximum outdegree d is an α-vertex-expander or G is ǫ-far from any α⋆-expander, for α⋆ = cα

d2 ln (n/ǫ) ,

where c is a large enough constant.

1: Let G′ be the graph obtained from G by adding 2d− deg+(v) self-loops to each vertex v.

2: T ← 20·log3 nǫ6

3: ℓ← 32d2 ln (n/ǫ)α2

4: for i← 1 . . . T do5: Using Algorithm 1, generate two random walks of length ℓ for each vertex of G′.6: Let Xi

v = 1 if the two random walks originating at v end at the same vertex, and Xiv = 0

otherwise.7: if

∑Ti=1

v∈V Xiv > T + 10 log2 n

ǫ3 then8: return Reject

9: else10: return Accept

In the rest of this section, we prove the following.

Theorem 42. Let G = (V,E) be a graph of maximum outdegree d. If G is an α-vertex-expander thenAlgorithm 12 outputs Accept with high probability. If G is ǫ far from any α⋆-vertex-expander, where

α⋆ = c·α2

d2 ln (n/ǫ), then the algorithm outputs Reject also with high probability. Let ℓ

def

= 32d2 ln (n/ǫ)α2 .

Algorithm 12 can be implemented in O(log ℓ) MPC rounds, using sublinear space per machine andO(mℓ log n) total space.

The round and space complexity follows from Lemma 13. The Accept and the Reject casesare proved separately in Lemma 47 and Lemma 49, respectively. Our analysis and the prior workwe recall are tailored to regular graphs whose random walks converge to a uniform distribution.Therefore, the algorithm first transform the given graph G into G′ by adding self-loops (see Line 1).Observe that adding self-loops does not affect vertex-cuts, and hence G′ and G have the samevertex-expansion. We will use P l

v to denote the distribution of the endpoints of a random walk oflength l originating at v.

8.1 Correctness of Acceptance

We use the notion of TV distance, that we recall next.

Definition 43. Let p1, . . . , pn and q1, . . . , qn be two probability distributions. Then, the total vari-ation distance (TVD) between these distributions is equal to 1/2

∑ni=1 |pi − qi|.

44

Page 46: Walking Randomly, Massively, and Efficiently

In our proof that Algorithm 12 outputs Accept correctly with probability at least 2/3, we usethe following results.

Lemma 44 ([GR11]). Let Xv be a random variable that equals 1 if 2 random walks of length lstarting from vertex v collide, and Xv = 0 otherwise. Let P l

v denote the distribution of the endpointsof these random walks. Then,

E [Xv] = ‖P lv‖2.

Lemma 45 (Proposition 2.8 of [GR11] and discussion thereafter). Let P lv be the distribution of the

endpoints a random walk of length l starting from vertex v. Let G′ be the graph as defined on Line 1of Algorithm 12. If G′ is an α-vertex-expander, then for ℓ as defined on Line 3 of Algorithm 12 theTVD between P l

v and the uniform distribution on n vertices is upper-bounded by ǫ/n.

We will use the following result to upper-bound the l2 norm of the vector P lv defined in Lemma 45.

Lemma 46. Let Y ∈ Rn be a probability distribution vector. If the TVD between Y and the uniform

distribution is at most ǫ/n, i.e., 12

∑ni=1 |Yi − 1/n| ≤ ǫ/n, then

‖Y ‖2 ≤ 1 + 4ǫ2/n

n.

Proof. Let yi = 1/n + αi. We have that 1/2∑n

i=1 |αi| ≤ ǫ/n and∑n

i=1 αi = 0.

‖Y ‖2 =n∑

i=1

(1/n + αi)2 = 1/n +

n∑

i=1

2 · 1/n · αi +

n∑

i=1

α2i = 1/n+

n∑

i=1

α2i

≤ 1/n+

(

n∑

i=1

|αi|)2

≤ 1/n + 4ǫ2/n2 = (1 + 4ǫ2/n)/n

We are now ready to provide the main proof of this section.

Lemma 47. If G is an α-vertex-expander, then ExpansionTester(G,α, ǫ) returns Accept withprobability at least n−2.

Proof. As defined on Line 6 of ExpansionTester, let Xiv be 0/1 random variable that equals 1

iff the two random walks originating at v end at the same vertex. Define Xdef

=∑T

i=1

v∈V Xiv that

corresponds to the summation of Line 7 of Algorithm 12. From Lemmas 44 to 46 we have

E [X] = T∑

v∈V‖P ℓ

v‖2 ≤ T · (1 + 4ǫ2/n) ≤ T + 1, (21)

for any 4ǫ2T ≤ n. Also, as P ℓv is the probability distribution n-dimensional vector, we have ‖P ℓ

v‖2 ≥1/n and hence

E [X] = T∑

v∈V‖P ℓ

v‖2 ≥ T = 20·log3 nǫ6

, (22)

45

Page 47: Walking Randomly, Massively, and Efficiently

where we used the definition of T on Line 2 of Algorithm 12. Now we can write

P

[

X ≥ T +10 log2 n

ǫ3

]

from (21)

≤ P

[

X ≥ E [X]− 1 +10 log2 n

ǫ3

]

≤ P

[

X ≥(

1 +9 log2 n

ǫ3E [X]

)

E [X]

]

(23)

From (22) we have that 9 log2 nǫ3E[X]

≤ 1. Observe that across v and i the random variables are Xiv

independent. By applying Chernoff bound (Theorem 7 (C)) on (23) we derive

P

[

X ≥ T +10 log2 n

ǫ3

]

≤ exp

(

−81 log4 n

3ǫ6E [X]

)

.

From E [X] ≤ T + 1 (see Eq. (21)) and for T = 20·log3 nǫ6

, as defined on Line 2 of Algorithm 12, thelast chain of inequalities is upper-bounded by n−1. Therefore, ExpansionTester outputs Accept

with high probability, as desired.

8.2 Correctness of Rejection

We use the following result to prove that our algorithm reports Reject properly.

Lemma 48 (Lemma 4.3 and Lemma 4.7 of [CS10]). Let G′ = (V,E) be a 2d-regular graph suchthat each vertex has at least d self-loops. Let ℓ be as defined on Line 3 of Algorithm 12, and letα⋆ = cα2

d2 ln (n/ǫ), where c is a large enough constant. If G′ is ǫ-far from every α⋆-expander, for

α⋆ ≤ 1/10, then there exists a subset of vertices U such that:

• |U | ≤ ǫ24 |V |; and,

• ‖P ℓv‖22 ≥ 1+9ǫ

n for each v ∈ U .

We are now ready to finalize our analysis.

Lemma 49. Let ǫ ∈ (0, 1/5) be a parameter. If G is ǫ far from every α⋆-vertex-expander, thenExpansionTester(G,α, ǫ) returns Reject with probability at least n−2.

Proof. For any vertex v ∈ V , as P ℓv is a probability distribution, it holds ‖P ℓ

v‖2 ≥ 1/n. As definedon Line 6 of ExpansionTester, let Xi

v be 0/1 random variable that equals 1 iff the two random

walks originating at v end at the same vertex. Define Xdef

=∑T

i=1

v∈V Xiv. Then, if G is ǫ-far from

every α⋆-vertex-expander from Lemma 48 we have

E [X] = T

v∈V \U‖P ℓ

v‖2 +∑

v∈U‖P ℓ

v‖2

≥ T

(

(

1− ǫ

24

) 1

n+

ǫ

24

1 + 9ǫ

n

)

= T

(

(

1− ǫ

24

) 1

n+

ǫ

24

1 + 9ǫ

n

)

≥ T

(

1 +ǫ2

3

)

.

46

Page 48: Walking Randomly, Massively, and Efficiently

By Chernoff bound and the last inequality it holds

P

X ≤

1−

6 log n

T(

1 + ǫ2

3

)

T

(

1 +ǫ2

3

)

≤ P

[

X ≤(

1−√

6 log n

E [X]

)

E [X]

]

≤ n−2. (24)

From the definition of T (Line 2 of Algorithm 12), we have

1−

6 log n

T(

1 + ǫ2

3

)

T

(

1 +ǫ2

3

)

≥ T + 6log3 n

ǫ4−

240log4 n

ǫ6

≥ T + 6log3 n

ǫ4− 16

log2 n

ǫ3

≥ T +10 log2 n

ǫ3,

for ǫ ≤ 1/5. This together with Eq. (24) and Line 7 of Algorithm 12 concludes the proof.

Acknowledgments

We thank Davin Choo and Julian Portmann for valuable discussions. S. Mitrović was supported bythe Swiss NSF grant P2ELP2_181772 and MIT-IBM Watson AI Lab. P. Sankowski was supportedby the ERC CoG grant TUgbOAT no 772346.

A Random Walks in Directed Bounded Degree Graphs

In this section we show how to efficiently sample short random walks from directed graphs, providedthat the outdegree of each vertex is bounded.

Let dist(u,w) be the length of the shortest path from u to w. We define a ball of center v andradius, denoted by B(v, d), to be the set of vertices x of G, such that dist(v, x) ≤ d. In particular,|B(v, 1)| contains v and all vertices reachable from v by its outedges.

Observation 50. Let G be a directed graph and let ∆ be the maximum outdegree in G. Then, foreach v ∈ V (G) and any integer d ≥ 0 we have |B(v, d)| = O(∆d) and |G[B(v, d)]| = O(∆d+1).

Let is first describe the high-level idea. Assume that the goal is to compute a single randomwalk of length log n. We can compute B(v, d) for all v and d = ǫ log n. In the next step, for eachball B(v, d) we compute G[B(v, d)]. Then, to find a random walk starting from any vertex v, wecan compute ǫ log n steps of that random walk in a single round on a single machine that knowsG[B(v, d)]. Hence, only O(1/ǫ) steps like this are needed to compute a random walk of length log n.At the same time, if ∆ is a constant, we only need O(n1+ǫ) space to store all graphs G[B(v, d)]. Inthe remaining part of this section, we describe the details of this approach.

47

Page 49: Walking Randomly, Massively, and Efficiently

Algorithm 13 An algorithm for sampling av random walks of length l starting from vertex v (foreach v ∈ V ).

1: for all v ∈ V in parallel do2: B(v, 1) := v ∪ x | vx ∈ E3: r := ǫ/2 log∆ n4: r := 2⌊log2 r⌋ ⊲ Round down to a power of two5: for i← 1 . . . log2 r do6: for all v ∈ V in parallel do7: B(v, 2i) :=

x∈B(v,2i−1)B(x, 2i−1)

8: for v ∈ V do9: a0v,v = av

10: for i← 1 . . . l/r do ⊲ For the pseudocode, assume that r divides l11: for all t such that ∃tai−1

s,t 6= 0 in parallel do12: Compute G[B(t, r)]13: for all s such that ai−1

s,t 6= 0 in parallel do

14: Use G[B(t, r)] to compute ai−1s,t length r random walks from t

15: For each random walks computed in the previous step which ends in t′ increase ais,t′

Lemma 51. Algorithm 13 is correct.

Proof. The first step is to show that the algorithm correctly computes the sets B(v, 2i) for all v andi = 1, . . . , log2 r. This follows directly from the fact that B(v, 2i) :=

x∈B(v,2i−1)B(x, 2i−1).The random walk themselves are computed by the loop in lines 10-15. Each iteration of the

loop extends all random walks by r edges. The algorithm uses variables ais,t to represent its state,as follows. After the i-th iteration, the algorithm has computed ais,t random walks of length i · r,which start in s and end in t. For each v ∈ V and i the algorithm maintains the invariant that∑

t∈V aiv,t = av.To extend the random walks by r steps, we use the basic fact that a length-r random walk

from v is fully contained in G[B(v, r)]. Hence, having G[B(v, r)] is enough to compute r steps of arandom walk. It follows easily that the variables ais,t are updated correctly. Finally, we note thatthe pseudocode assumes that r divides l, but this assumption can be easily dropped by computingonly l mod r steps of random walk in the last iteration of the main loop.

Lemma 52. Assume that∑

v∈V av = O(n1+ǫ). For any ǫ > 0, Algorithm 13 can be implementedin MPC model to run in O(log log n+ l log∆/(ǫ log n)) rounds, using O(m+ n1+ǫ) total space andO(nǫ) space per machine, where ∆ is the maximum degree in the graph and log ∆/ log n = o(1).

Proof. Lines 1–9 can be implemented in the MPC model in a straightforward way, so in the proofwe focus on the remaining part of the algorithm. Let us first bound the space needed to storeG[B(v, r)] for all v. Note that r ≤ ǫ log n/(2 log∆). By Observation 50 we have

|G[B(v, r)]| = O(∆r+1) = O(nǫ/2+log∆/ logn) = O(nǫ).

Hence, storing G[B(v, r)] for all v ∈ V requires O(n1+ǫ) total space.In the i-th iteration of the algorithm there are ct :=

s∈V ai−1s,t random walks that end in t and

need to be extended by using G[B(t, r)]. For each such walk that ends in t′ we increase as,t′ . In

48

Page 50: Walking Randomly, Massively, and Efficiently

order to make sure that each machine ends up increasing at most O(nǫ) counters as,t′ , we use thefollowing batching strategy. Recall that for each vertex t, we need to compute ct random walksstarting in t. In order to do that we use ⌈ct/nǫ⌉ machines, each of which computes G[B(v, r)]independently. The total space used by all the machines is then

t∈V⌈ct/nǫ⌉O(nǫ) ≤

t∈V(ct/n

ǫ + 1)O(nǫ) = O(n1+ǫ) +∑

t∈Vct = O(n1+ǫ) +

v∈Vav = O(n1+ǫ)

By combining the lemmas from this section we obtain the following.

Theorem 53. Let l > 0, G be a directed graph with maximum outdegree bounded by ∆ and letavv∈V be a sequence such that

v∈V av = O(n1+ǫ). There exists an MPC algorithm that for eachv ∈ V computes av endpoints of random walks of length l starting from av. The algorithm usesO(log log n+ l log ∆/(ǫ log n)) rounds, O(m+ n1+ǫ) total space and O(nǫ) space per machine.

Note that with l = O(log n) and ∆ = poly log n we get O(log log n) rounds. Moreover, if∆ = O(1), we can set ǫ = 1/ log log n and use only O(m+ n1+o(1)) total space.

B Handling Dangling Vertices

From theoretical point of view, when dangling vertices are present the transition matrix for thePageRank walks is not stochastic. In particular, the largest eigenvector is no longer equal to 1.Hence, many ways to handle dangling vertices have been proposed, see e.g., [Ber05b]. For example,we can delete them, we can lump them to one vertex and add self-loop, we can add a self-loop tothem, each dangling vertex can be linked to an artificial vertex with a self-loop (sink) or we canconnect each dangling vertex to every other vertex. The first solution is often mentioned even inthe original PageRank paper, however, it is infeasible in our case, as it requires finding stronglyconnected components first. This last solution seems to be the most accepted and the most widelyused one, as it can be interpreted as restarting the random walk from a random state if we reach adangling vertex. In this paper, we assume that one of the above solutions has been already appliedto our graph and, therefore, every vertex has at least one outgoing edge. Still, in this section, wegive a novel relation between two most widely applied methods to handle dangling vertices. Wehave made a through literature study and to the best of our knowledge these relations have notbeen observed before. Usually, one heuristically argues that these approaches give similar results.We give formal proofs that it is indeed the case. We show that adding self-loops and restarting thewalks are equivalent up to a simple transformation.

In the case when dangling vertices are present the transition matrix of our graph T = T (G) canbe decomposed into the following blocks

T =

[

T1 0T2 0

]

,

where the bottom k rows correspond to dangling vertices without any out edges. Adding self-loopsgives us the following transition matrix

T s =

[

T1 0T2 I

]

.

49

Page 51: Walking Randomly, Massively, and Efficiently

Note that the way we defined transition matrices T and T s means that the entry at row u andcolumn v corresponds to incoming arc to u from v. Hence, a stationary distribution of T is a righteigenvector of T .

The PageRank matrix of T s is given by

(1− ǫ)T s +ǫ

nJ =

[

(1− ǫ)T1 +ǫnJn−k,n−k

ǫnJn−k,k

(1− ǫ)T2 +ǫnJk,n−k (1− ǫ)I + ǫ

nJk,k

]

, (25)

where Ji,j refers to an all 1 matrix of size i× j. We denote by πs =

[

πs1

πs2

]

the stationary distribution

of the above PageRank matrix. When we consider restarting the walks we obtain the followingmatrix

T r =

[

T11nJn−k,k

T21nJk,k

]

.

The PageRank matrix of T r is given by

(1− ǫ)T r +ǫ

nJ =

[

(1− ǫ)T1 +ǫnJn−k,n−k

1nJn−k,k

(1− ǫ)T2 +ǫnJk,n−k

1nJk,k

]

, (26)

with πr =

[

πr1

πr2

]

being the stationary distribution. In the rest of this section, given a vector x ∈ Rn,

we use |x| to denote that l1 norm of x.

Theorem 54. Let πr and πs be the stationary distribution vectors as defined above. Then, it holds

πr1 =

1

ǫ− ǫ|πs1|+ |πs

1|πs1

and

πr2 =

1

1/ǫ− |πs2|/ǫ+ |πs

2|πs2.

Proof. Let us first consider the upper blocks of (25) when used in the stationary equation. Weobtain

(

(1 − ǫ)T1 +ǫ

nJ)

πs1 +

ǫ

n|πs

2|~1 = πs1

(

I − (1− ǫ)T1 −ǫ

nJ)

πs1 =

ǫ

n|πs

2|~1

πs1 =

ǫ

n|πs

2|(

I − (1− ǫ)T1 −ǫ

nJ)−1

~1 =ǫ

n|πs

2|g, (27)

where g =(

I − (1− ǫ)T1 − ǫnJ)−1~1.

Now, consider the upper blocks of (26). Also from the stationary equation we derive

(

(1− ǫ)T1 +ǫ

nJ)

πr1 +

1

n|πr

2|~1 = πr1.

(

I − (1− ǫ)T1 −ǫ

nJ)

πr1 =

1

n|πr

2|~1.

50

Page 52: Walking Randomly, Massively, and Efficiently

πr1 =

1

n|πr

2|(

I − (1− ǫ)T1 −ǫ

nJ)−1

~1 =1

n|πr

2|g. (28)

From (27) with (28) we conclude that both πr1 and πs

1 are parallel to g, and hence πr1 = xπs

1, forsome x ∈ R.

Next we consider lower blocks of (25) and (26). From the lower blocks of (25) we establish(

(1− ǫ)T2 +ǫ

nJ)

πs1 + (1− ǫ)πs

2 +ǫ

n|πs

2|~1 = πs1

(

(1− ǫ)T2 +ǫ

nJ)

πs1 +

ǫ

n|πs

2|~1 = ǫπs2.

By plugging the last equality into (27) we get(

(1− ǫ)T2 +ǫ

nJ) ǫ

n|πs

2|g +ǫ

n|πs

2|~1 = ǫπs2

(

(1− ǫ)T2 +ǫ

nJ) 1

n|πs

2|g +1

n|πs

2|~1 = πs2

|πs2|(

(

(1− ǫ)T2 +ǫ

nJ) 1

ng +

1

n~1

)

= πs2. (29)

The lower blocks of (26) give

(

(1− ǫ)T2 +ǫ

nJ)

πr1 +

1

n|πr

2|~1 = πr2.

Plugging the lest equality into (28) leads to

(

(1− ǫ)T2 +ǫ

nJ) 1

n|πr

2|g +1

n|πr

2|~1 = πr2

|πr2|(

(

(1− ǫ)T2 +ǫ

nJ) 1

ng +

1

n~1

)

= πr2. (30)

Again by (29) and (30) we see that πr2 = yπs

2, for some y ∈ R. We have that |πs1| = 1 − |πs

2| and|πr

1| = 1− |πr2|, so from (27) and (28) we obtain

1− |πs2| =

ǫ

n|πs

2||g| and 1− |πr2| =

1

n|πr

2||g|,

which implies

1− |πs2| =

ǫ

n|πs

2||g| and 1− y|πs2| =

1

ny|πs

2||g|.

By solving these two equations for y we obtain

y =1

1/ǫ− |πs2|/ǫ+ |πs

2|.

Using the same approach to x we get

x =1

ǫ− ǫ|πs1|+ |πs

1|,

what finishes the proof.

51

Page 53: Walking Randomly, Massively, and Efficiently

Theorem 54 has a few consequences. First of all, one can easily see that scores of vertices in πr1

are all higher than scores of vertices in πs1. More importantly, we can easily obtain πr in O(1) MPC

rounds from πs. We note that πs are much easier to compute as obtaining them requires only minorgraph modification, i.e., adding self-loops to dangling vertices. Whereas, the scores πr are the mostwidely accepted ones.

References

[ABB+19] Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab Mirrokni, and CliffStein. Coresets meet EDCS: algorithms for matching and vertex cover on massivegraphs. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms, pages 1616–1635. SIAM, 2019.

[ACK19] Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for (∆+1) vertexcoloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms, pages 767–786. Society for Industrial and Applied Mathematics, 2019.

[ACL06] Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning usingPageRank vectors. In 47th Annual IEEE Symposium on Foundations of ComputerScience (FOCS 2006), 21-24 October 2006, Berkeley, California, USA, Proceedings,pages 475–486, 2006.

[ALNO07] Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. MonteCarlo methods in PageRank computation: When one iteration is sufficient. SIAMJournal on Numerical Analysis, 45(2):890–904, 2007.

[ANOY14] Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Par-allel algorithms for geometric graph problems. In Proceedings of the 46th ACM Sym-posium on Theory of Computing, STOC 2014, New York, NY, USA, May 31–June 3,2014, pages 574–583, 2014.

[ASS+18] Alexandr Andoni, Zhao Song, Clifford Stein, Zhengyu Wang, and Peilin Zhong. Parallelgraph connectivity in log diameter rounds. In 2018 IEEE 59th Annual Symposium onFoundations of Computer Science (FOCS), pages 674–685. IEEE, 2018.

[ASW19] Sepehr Assadi, Xiaorui Sun, and Omri Weinstein. Massively parallel algorithms for find-ing well-connected components in sparse graphs. In Proceedings of the 2019 ACM Sym-posium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada,July 29 - August 2, 2019., pages 461–470, 2019.

[BBCT12] Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. A sublineartime algorithm for PageRank computations. In International Workshop on Algorithmsand Models for the Web-Graph, pages 41–53. Springer, 2012.

[BCX11] Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. Fast personalized PageRankon MapReduce. In Proceedings of the ACM SIGMOD International Conference onManagement of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, pages 973–984, 2011.

52

Page 54: Walking Randomly, Massively, and Efficiently

[BDE+19] Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, and VahabMirrokni. Near-optimal massively parallel graph connectivity. FOCS, 2019.

[Ber05a] Pavel Berkhin. A survey on PageRank computing. Internet Mathematics, 2(1):73–120,2005.

[Ber05b] Pavel Berkhin. A survey on pagerank computing. Internet Math., 2(1):73–120, 2005.

[BFU18] Sebastian Brandt, Manuela Fischer, and Jara Uitto. Matching and MIS for uniformlysparse graphs in the low-memory MPC model. arXiv preprint arXiv:1807.05374, 2018.

[BHH19] Soheil Behnezhad, MohammadTaghi Hajiaghayi, and David G Harris. Exponentiallyfaster massively parallel maximal matching. FOCS, 2019.

[BKS13] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel queryprocessing. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposiumon Principles of Database Systems, PODS 2013, New York, NY, USA, June 22–27,2013, pages 273–284, 2013.

[BP98] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web searchengine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.

[BP11] Marco Bressan and Luca Pretto. Local computation of PageRank: the ranking side. InProceedings of the 20th ACM international conference on Information and knowledgemanagement, pages 631–640. ACM, 2011.

[BPP18] Marco Bressan, Enoch Peserico, and Luca Pretto. Sublinear algorithms for local graphcentrality estimation. In 59th IEEE Annual Symposium on Foundations of ComputerScience, FOCS 2018, Paris, France, October 7-9, 2018, pages 709–718, 2018.

[Bre02] LA Breyer. Markovian page ranking distributions: some theory and simulations. 2002.

[CFSV16] Keren Censor-Hillel, Eldar Fischer, Gregory Schwartzman, and Yadu Vasudev. Fastdistributed algorithms for testing graph properties. In Distributed Computing - 30thInternational Symposium, DISC 2016, Paris, France, September 27-29, 2016. Proceed-ings, pages 43–56, 2016.

[CGR+14] Artur Czumaj, Oded Goldreich, Dana Ron, C. Seshadhri, Asaf Shapira, and Chris-tian Sohler. Finding cycles and trees in sublinear time. Random Struct. Algorithms,45(2):139–184, 2014.

[CGS04] Yen-Yu Chen, Qingqing Gan, and Torsten Suel. Local methods for estimating PageRankvalues. In Proceedings of the thirteenth ACM international conference on Informationand knowledge management, pages 381–389. ACM, 2004.

[CKK+18] A. Chiplunkar, M. Kapralov, S. Khanna, A. Mousavifar, and Y. Peres. Testing graphclusterability: Algorithms and lower bounds. In 2018 IEEE 59th Annual Symposiumon Foundations of Computer Science (FOCS), pages 497–508, Oct 2018.

53

Page 55: Walking Randomly, Massively, and Efficiently

[CŁM+18] Artur Czumaj, Jakub Łącki, Aleksander Mądry, Slobodan Mitrović, Krzysztof Onak,and Piotr Sankowski. Round compression for parallel matching algorithms. In Pro-ceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages471–484. ACM, 2018.

[CMOS19] Artur Czumaj, Morteza Monemizadeh, Krzysztof Onak, and Christian Sohler. Planargraphs: Random walks and bipartiteness testing. Random Structures & Algorithms,2019.

[Col88] Richard Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770–785, 1988.

[CPS15] Artur Czumaj, Pan Peng, and Christian Sohler. Testing cluster structure of graphs.In Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing,STOC ’15, pages 723–732, New York, NY, USA, 2015. ACM.

[CS10] Artur Czumaj and Christian Sohler. Testing expansion in bounded-degree graphs.Combinatorics, Probability and Computing, 19(5-6):693–709, 2010.

[DCGR05] Gianna M. Del Corso, Antonio Gullí, and Francesco Romani. Fast pagerank computa-tion via a sparse linear system. Internet Math., 2(3):251–273, 2005.

[DGP11] Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating PageRank ongraph streams. J. ACM, 58(3):13:1–13:19, 2011.

[DMPU15] Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. Fastdistributed PageRank computation. Theor. Comput. Sci., 561(PB):113–121, January2015.

[DNPT13] Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, and Prasad Tetali. Dis-tributed random walks. J. ACM, 60(1):2:1–2:31, 2013.

[DSB09] Neelam Duhan, AK Sharma, and Komal Kumar Bhatia. Page ranking algorithms: asurvey. In 2009 IEEE International Advance Computing Conference, pages 1530–1537.IEEE, 2009.

[GGK+18] Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrović, and RonittRubinfeld. Improved massively parallel computation algorithms for MIS, matching, andvertex cover. Proceedings of the 37th ACM Principles of Distributed Computing (PODC2018), 2018.

[GKK13] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. Perfect matchings in O(n log n)time in regular bipartite graphs. SIAM J. Comput., 42(3):1392–1404, 2013.

[GKMS18] Buddhima Gamlath, Sagar Kale, Slobodan Mitrović, and Ola Svensson. Weightedmatchings via unweighted augmentations. arXiv preprint arXiv:1811.02760, 2018.

[GKU19] Mohsen Ghaffari, Fabian Kuhn, and Jara Uitto. Conditional hardness results for mas-sively parallel computation from distributed lower bounds. FOCS, 2019.

54

Page 56: Walking Randomly, Massively, and Efficiently

[GLM19] Mohsen Ghaffari, Silvio Lattanzi, and Slobodan Mitrović. Improved parallel algorithmsfor density-based network clustering. In International Conference on Machine Learning,pages 2201–2210, 2019.

[GR99] Oded Goldreich and Dana Ron. A sublinear bipartiteness tester for bounded degreegraphs. Combinatorica, 19(3):335–373, 1999.

[GR11] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. InStudies in Complexity and Cryptography. Miscellanea on the Interplay between Ran-domness and Computation, pages 68–75. Springer, 2011.

[GSZ11] Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and sim-ulation in the MapReduce framework. In International Symposium on Algorithms andComputation, pages 374–383. Springer, 2011.

[GU19] Mohsen Ghaffari and Jara Uitto. Sparsifying distributed algorithms with ramificationsin massively parallel computation and centralized local computation. In Proceedings ofthe Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1636–1653.SIAM, 2019.

[HLL18] Nicholas JA Harvey, Christopher Liaw, and Paul Liu. Greedy and local ratio algorithmsin the MapReduce model. In Proceedings of the 30th on Symposium on Parallelism inAlgorithms and Architectures, pages 43–52. ACM, 2018.

[HZ96a] Shay Halperin and Uri Zwick. An optimal randomised logarithmic time connectivityalgorithm for the EREW PRAM. J. Comput. Syst. Sci., 53(3):395–416, 1996.

[HZ96b] Shay Halperin and Uri Zwick. Optimal randomized EREW PRAM algorithms for find-ing spanning forests and for other basic graph connectivity problems. In Proceedings ofthe Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’96, pages438–447, Philadelphia, PA, USA, 1996. Society for Industrial and Applied Mathematics.

[Jin19] Ce Jin. Simulating random walks on graphs in the streaming model. In 10th Innovationsin Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, SanDiego, California, USA, pages 46:1–46:15, 2019.

[JS96] Mark Jerrum and Alistair Sinclair. The Markov chain Monte Carlo method: an ap-proach to approximate counting and integration. Approximation algorithms for NP-hardproblems, pages 482–520, 1996.

[KKR04] Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartite-ness in general graphs. SIAM J. Comput., 33(6):1441–1483, 2004.

[KM09] Jonathan A. Kelner and Aleksander Mądry. Faster generation of random spanningtrees. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS2009, October 25-27, 2009, Atlanta, Georgia, USA, pages 13–21, 2009.

[KNP99] David R. Karger, Noam Nisan, and Michal Parnas. Fast connected components algo-rithms for the EREW PRAM. SIAM J. Comput., 28(3):1021–1034, 1999.

55

Page 57: Walking Randomly, Massively, and Efficiently

[KS11] Satyen Kale and Comandur Seshadhri. An expansion tester for bounded degree graphs.SIAM Journal on Computing, 40(3):709–720, 2011.

[KSV10] Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computationfor MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2010, Austin, Texas, USA, January 17–19, 2010, pages 938–948,2010.

[LF80] Richard E Ladner and Michael J Fischer. Parallel prefix computation. Journal of theACM (JACM), 27(4):831–838, 1980.

[LM04] Amy N Langville and Carl D Meyer. Deeper inside PageRank. Internet Mathematics,1(3):335–380, 2004.

[LMSV11] Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. Filtering:a method for solving graph problems in MapReduce. In Proceedings of the twenty-thirdannual ACM symposium on Parallelism in algorithms and architectures, pages 85–94.ACM, 2011.

[LPS86] Alexander Lubotzky, Ralph Phillips, and Peter Sarnak. Explicit expanders and theRamanujan conjectures. In Proceedings of the 18th Annual ACM Symposium on Theoryof Computing, May 28-30, 1986, Berkeley, California, USA, pages 240–246, 1986.

[Mar73] Gregory A. Margulis. Explicit constructions of expanders. Problemy Peredachi Infor-matsii, 9(4):71–80, 1973.

[NS10] Asaf Nachmias and Asaf Shapira. Testing the expansion of a graph. Inf. Comput.,208(4):309–314, 2010.

[Ona18] Krzysztof Onak. Round compression for parallel graph algorithms in strongly sublinearspace. arXiv preprint arXiv:1807.08745, 2018.

[PBMW99] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank ci-tation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab,November 1999.

[Rei85] John H. Reif. An optimal parallel algorithm for integer sorting. In Proceedings ofthe 26th Annual Symposium on Foundations of Computer Science, SFCS ’85, pages496–504, Washington, DC, USA, 1985. IEEE Computer Society.

[RVW18] Tim Roughgarden, Sergei Vassilvitskii, and Joshua R Wang. Shuffles and circuits (onlower bounds for modern parallel computation). Journal of the ACM (JACM), 65(6):41,2018.

56