3 grechnikov

Statistics of a random graph in theBollobas-Borgs-Chayes-Riordan model

Evgeny Grechnikov

Yandex

October 25, 2013

Directed preferential attachment model1

Parameters α ≥ 0, β ≥ 0, γ ≥ 0, δin ≥ 0, δout ≥ 0, α + β + γ = 1Start with some graph G (t0) with t0 edges at time t0

&%'$G (t)

��α

` &%'$`-w

?β

&%'$`v w-

@@Rγ

&%'$`

v`-

Pr(v) = degout(v)+δoutt+δoutn

, Pr(w) = degin(w)+δint+δinn

new-in With probability α, add new edge from new vertex to w

old With probability β, add new edge between v ,w ∈ G (t)

new-out With probability γ, add new edge from v to new vertex1B. Bollobas, Ch. Borgs, J. Chayes, O. Riordan, Directed scale-free graphs,

ACM-SIAM Symposium on Discrete Algorithms, 2003

Number of vertices

I At every step, new edge is created

I So, number of edges at time t is equal to t

I At every step, new vertex is created with probability α + γ

I Assume α + γ > 0; otherwise the model is not interesting

I Total number of vertices at time t is whp(α + γ)t + O(

√tϕ(t)), ϕ(t)→∞ as t →∞

In-degree/out-degree sequence

Preliminaries

I Out-degree sequence is symmetrical to in-degree sequencewith α↔ γ, δin ↔ δout

I We study in-degree sequence: let nin(t, d) be the number ofnodes with in-degree d at time t

I Special case: γ = 1 (always use [new-out] procedure):I in-degree of every vertex not from G (t0) is 1I nin(t, d) = [d = 1]t + O(1)

I Special case: γ = 0, δin = 0 (never use [new-out] procedure,never select zero-degree vertex for preferential attachment):

I in-degree of every vertex not from G (t0) is 0I nin(t, d) = [d = 0]t + O(1)

Previous work

I Bollobas-Riordan-Chayes-Borgs: d is fixed, t grows

I cin = α+β1+δin(α+γ)

I nin(t, d) = pd t + od(t) whp

I pd ∼ Ad−1− 1

cin , A > 0 is some constant

I The proof actually contains explicit formulas for pd

Previous work

I Cooper2: more general model

I If cin <12 , let dmax = tcin/3. Otherwise, let dmax = t1/6

log2 t

I nin(t, d) = pd t(

1 + O(

1√log t

))whp for d ≤ dmax

2C. Cooper, Distribution of vertex degree in web-graphs, Journal ofCombinatorics, Probability and Computing, Volume 15 Issue 5, September 2006

Results

TheoremEnin(t, d) = pd t + O(1) for any d ≥ 0

I O(1) does not depend on d . It follows that when d growsfrom 0 to t, Enin(t, d) decays from Θ(t) to O(1)

I Degrees of some vertices can be extremely large for certainvalues of parameters. In the special case γ = δin = 0 it ispossible that nin(t, t) 6= 0: when G (t0) is a star, all new edgesare directed to the center of that star. When γ and δin areclose to 0, Enin(t, d) can decrease very slowly after O(1) isreached

I The theorem does not say anything about the concentration.Preliminary calculations show that whp|nin(t, d)− Enin(t, d)| ≤

√pd t + 1ϕ(t) for any function ϕ

such that ϕ(t)→∞ as t →∞, but this is not a rigoroustheorem yet

Stage 1 of the proof: recurrent equation

I What is E (nin(t + 1, d)|G (t), [new-in])?

[new-in] adds one vertex with in-degree 0

[new-in] increments in-degree of some vertex w selected with

probability Pr(w) = degin(w)+δint+δinn

E (nin(t + 1, d)|G (t), [new-in]) = [d = 0] + nin(t, d)

− nin(t, d)d + δint + δinn

+ [d ≥ 1]nin(t, d − 1)d − 1 + δint + δinn

I Two other cases E (nin(t + 1, d)|G (t), [old]) andE (nin(t + 1, d)|G (t), [new-out]) are similar

Stage 1 of the proof: recurrent equation

I So we have

E (nin(t + 1, d)|G (t), [new-in]) = [d = 0] + nin(t, d)

− nin(t, d)d + δint + δinn

+ [d ≥ 1]nin(t, d − 1)d − 1 + δint + δinn

I n is concentrated around the mean, but variations can be

about Θ(

1√t

), that is too large

I Consider E (nin(t, d)|#G (t) = n) = Ed(T ,N) as a function ofT = t − t0, d ,N = n − n0

I Taking expectation yields a linear recurrent equation forEd(T ,N)

Stage 2 of the proof: solving the equation

I There are two arguments growing linearly with time, T andN. The recurrent equation binds E·(T + 1,N) to E·(T ,N)and E·(T ,N − 1)

I The main term has the form Tpd(NT

), where pd(x) is some

analytical function on [0, 1]. If x = NT+1 , then

Tpd

(N

T

)= Tpd

(x +

x

T

)= Tpd(x) + xp′d(x) +

x2

2Tp′′d(ξ),

Tpd

(N − 1

T

)= Tpd

(x − 1− x

T

)= Tpd(x)− (1− x)p′d(x) +

(1− x)2

2Tp′′d(ξ)

Finalizing the proof

I pd(x) are selected so that the main term satisfies to therecurrent equation ”up to O(d/T )”

I The bound O(1) for the remainder term follows by induction(this requires some amount of calculations)

I This gives Ed(N,T ). It remains to calculateEnin(d , t) = E (Ed(N,T ))

I N has a binomial distribution with parameters T and α + γ

I If f (x) ∈ C 2[0, 1], then

Ef

(N

T

)= f (α + γ) + O

(max0≤x≤1 |f ′′(x)|

T

),

this follows easily from f (N/T ) =f (α+γ) + f ′(α+γ)(N/T −α−γ) + f ′′(ξ)(N/T −α−γ)2/2.

Concentration

I Azuma-Hoeffding inequality yields a bound|nin(d , t)− Enin(d , t)| ≤

√tϕ(t) whp; this is ideal for

constant d , but useless when rd t <√t

I Main idea: En2in(d , t) can be calculated using the same

process as Enin(d , t)

I More precisely, there is a linear recurrence forE (nin(d1, t)nin(d2, t))|#G (t) = n) in terms of the sameexpectation for pairs (d1 − 1, d2) and (d1, d2 − 1)

I Knowing En2in(d , t), we can calculate Dnin(d , t) and apply

Chebyshev’s inequality

Pr(|nin(d , t)− Enin(d , t)| ≥ σ

√Dnin(d , t)

)≤ 1

σ2

I Preliminary calculations show that Dnin(d , t) has the sameorder as Enin(d , t). However, the rigorous proof is yet to bedeveloped

Number of edges between vertices of given degrees

Problem

I Motivation: we want to calculate the expected number ofedges (or probability of an edge) between two given vertices

I Suppose that we know their in- and out-degrees, but nothingelse

I For a fixed vertex, in-degree and out-degree are essentiallyindependent due to the construction (although they both tendto grow with age). Thus, we use only out-degree d1 of thepotential source and in-degree d2 of the potential target

I From the previous part we know nin(d2, t) and nout(d1, t)whp (at least, when d1 and d2 are fixed and t grows). Thus,it is sufficient to calculate total number of edges from somevertex of out-degree d1 to some vertex of in-degree d2

I The problem: let X (t, d1, d2) be the total number of edges inG (t) with the following condition: out-degree of the source isd1, in-degree of the target is d2. Estimate X (t, d1, d2)

Results

TheoremAssume common case: α > 0, β > 0, γ > 0, δin > 0, δout > 0. Letcin = α+β

1+δin(α+γ) , cout = β+γ1+δout(α+γ) , x = α + γ. Then

EX (t, d1, d2) = tcX (d1, d2)(1 + od1,d2(1)), where

cX (x, d1, d2) = c00f (x, d1, d2) + c00b(x, d2, d1) + c01f (x, d1, d2) + c01b(x, d2, d1)

+ c10f (x, d1, d2) + c10b(x, d2, d1) + c11f (x, d1, d2) + c11b(x, d2, d1),

c00f (x, d1, d2) = x2 Γ(d1 + δout )

Γ(d1)Γ(δout + 1)

Γ(d2 + δin)

Γ(d2)Γ(δin + 1)

γ

α + γ

δout

1 + δoutx

1

cincout

×∫∫

0≤v1/cout≤w1/cin≤1vδout+1/cout−1(1− v)d1−1wδin+1/cin−1(1− w)d2−1

×

1− x

1 + δinx

α

α + γδin

∫ 1

w1/cintcin+cout−2dt +

γ

α + γw

cin+cout−1cin

dvdw,

Results

c01f (x, d1, d2) = [d2 ≥ 2]x2 Γ(d1 + δout )

Γ(d1)Γ(δout + 1)

Γ(d2 + δin)

Γ(d2 − 1)Γ(δin + 1)

γ

α + γ

δout

1 + δoutx

1

cincout

×∫∫

0≤v1/cout≤w1/cin≤1vδout+1/cout−1(1− v)d1−1wδin+1/cin (1− w)d2−2

×1− x

1 + δinx

γ

α + γ

∫ 1

w1/cintcin+cout−2dtdvdw,

c10f (x, d1, d2) = [d1 ≥ 2]x2 Γ(d1 + δout )

Γ(d1 − 1)Γ(δout + 1)

Γ(d2 + δin)

Γ(d2)Γ(δin + 1)

α

α + γ

1

1 + δoutx

1

cincout

×∫∫

0≤v1/cout≤w1/cin≤1vδout+1/cout (1− v)d1−2wδin+1/cin−1(1− w)d2−1

×

1− x

1 + δinx

α

α + γδin

∫ 1

w1/cintcin+cout−2dt +

γ

α + γw

cin+cout−1cin

dvdw,

c11f (x, d1, d2) = [d1 ≥ 2, d2 ≥ 2]x2 Γ(d1 + δout )

Γ(d1 − 1)Γ(δout + 1)

Γ(d2 + δin)

Γ(d2 − 1)Γ(δin + 1)

α

α + γ

1

1 + δoutx

1

cincout

×∫∫

0≤v1/cout≤w1/cin≤1vδout+1/cout (1− v)d1−2wδin+1/cin (1− w)d2−2

×1− x

1 + δinx

γ

α + γ

∫ 1

w1/cintcin+cout−2dtdvdw,

c··b is the same as c··f with exchanged properties of incoming and outgoing edges: α↔ γ, δin ↔ δout ,

cin ↔ cout .

Overview

I Key component of the formulas above is∫∫0≤v1/cout≤w1/cin≤1

va1−1(1− v)d1−1wa2−1(1− w)d2−1dvdw

I Formal proof: integrate by parts with respect to v , integrateby parts with respect to w , this gives a recurrent equationbinding values for pairs (d1, d2), (d1 − 1, d2), (d1, d2 − 1).This recurrent equation matches the recurrent equation forEX (t, d1, d2), and the remainder term can be bounded

I Physical sense: if a vertex v is not too close to the beginningand to the end of G (t) and t ′ < t, then

Pr(degin,t(v) = d | degin,t′(v) = d ′)

≈ Γ(d + δin)

Γ(d ′ + δin)Γ(d − d ′ + 1)

(t ′

t

)(d′+δin)cin (1−

(t ′

t

)cin)d−d′

Physical sense

I v1/cout and w1/cin in the formulas for cijk denote two ends of apossible edge

I cijf corresponds to edges oriented forward in time (v1/cout isthe source), cijb - backward in time

I Indices 0 and 1 encode whether one of vertices was created by[new-in] or [new-out] process

I c00f and c10f have two terms inside the integral, one of themcovers the situation when the first edge connects two givenvertices, another covers edges generated by subsequent [old]processes. c01f and c11f have only the latter term

Concentration

TheoremIf ϕ is any function such that ϕ(t)→∞ as t →∞, then|X (t, d1, d2)− EX (t, d1, d2)| ≤ (d1 + d2)

√tϕ(t) whp

I This is just an application of Azuma-Hoeffding inequality

Thank you

3 grechnikov

Technology

Transcript of 3 grechnikov