3 grechnikov
-
Upload
yandex -
Category
Technology
-
view
1.137 -
download
0
Transcript of 3 grechnikov
Statistics of a random graph in theBollobas-Borgs-Chayes-Riordan model
Evgeny Grechnikov
Yandex
October 25, 2013
Directed preferential attachment model1
Parameters α ≥ 0, β ≥ 0, γ ≥ 0, δin ≥ 0, δout ≥ 0, α + β + γ = 1Start with some graph G (t0) with t0 edges at time t0
&%'$G (t)
��α
` &%'$`-w
?β
&%'$`v w-
@@Rγ
&%'$`
v`-
Pr(v) = degout(v)+δoutt+δoutn
, Pr(w) = degin(w)+δint+δinn
new-in With probability α, add new edge from new vertex to w
old With probability β, add new edge between v ,w ∈ G (t)
new-out With probability γ, add new edge from v to new vertex1B. Bollobas, Ch. Borgs, J. Chayes, O. Riordan, Directed scale-free graphs,
ACM-SIAM Symposium on Discrete Algorithms, 2003
Number of vertices
I At every step, new edge is created
I So, number of edges at time t is equal to t
I At every step, new vertex is created with probability α + γ
I Assume α + γ > 0; otherwise the model is not interesting
I Total number of vertices at time t is whp(α + γ)t + O(
√tϕ(t)), ϕ(t)→∞ as t →∞
Preliminaries
I Out-degree sequence is symmetrical to in-degree sequencewith α↔ γ, δin ↔ δout
I We study in-degree sequence: let nin(t, d) be the number ofnodes with in-degree d at time t
I Special case: γ = 1 (always use [new-out] procedure):I in-degree of every vertex not from G (t0) is 1I nin(t, d) = [d = 1]t + O(1)
I Special case: γ = 0, δin = 0 (never use [new-out] procedure,never select zero-degree vertex for preferential attachment):
I in-degree of every vertex not from G (t0) is 0I nin(t, d) = [d = 0]t + O(1)
Previous work
I Bollobas-Riordan-Chayes-Borgs: d is fixed, t grows
I cin = α+β1+δin(α+γ)
I nin(t, d) = pd t + od(t) whp
I pd ∼ Ad−1− 1
cin , A > 0 is some constant
I The proof actually contains explicit formulas for pd
Previous work
I Cooper2: more general model
I If cin <12 , let dmax = tcin/3. Otherwise, let dmax = t1/6
log2 t
I nin(t, d) = pd t(
1 + O(
1√log t
))whp for d ≤ dmax
2C. Cooper, Distribution of vertex degree in web-graphs, Journal ofCombinatorics, Probability and Computing, Volume 15 Issue 5, September 2006
Results
TheoremEnin(t, d) = pd t + O(1) for any d ≥ 0
I O(1) does not depend on d . It follows that when d growsfrom 0 to t, Enin(t, d) decays from Θ(t) to O(1)
I Degrees of some vertices can be extremely large for certainvalues of parameters. In the special case γ = δin = 0 it ispossible that nin(t, t) 6= 0: when G (t0) is a star, all new edgesare directed to the center of that star. When γ and δin areclose to 0, Enin(t, d) can decrease very slowly after O(1) isreached
I The theorem does not say anything about the concentration.Preliminary calculations show that whp|nin(t, d)− Enin(t, d)| ≤
√pd t + 1ϕ(t) for any function ϕ
such that ϕ(t)→∞ as t →∞, but this is not a rigoroustheorem yet
Stage 1 of the proof: recurrent equation
I What is E (nin(t + 1, d)|G (t), [new-in])?
[new-in] adds one vertex with in-degree 0
[new-in] increments in-degree of some vertex w selected with
probability Pr(w) = degin(w)+δint+δinn
E (nin(t + 1, d)|G (t), [new-in]) = [d = 0] + nin(t, d)
− nin(t, d)d + δint + δinn
+ [d ≥ 1]nin(t, d − 1)d − 1 + δint + δinn
I Two other cases E (nin(t + 1, d)|G (t), [old]) andE (nin(t + 1, d)|G (t), [new-out]) are similar
Stage 1 of the proof: recurrent equation
I So we have
E (nin(t + 1, d)|G (t), [new-in]) = [d = 0] + nin(t, d)
− nin(t, d)d + δint + δinn
+ [d ≥ 1]nin(t, d − 1)d − 1 + δint + δinn
I n is concentrated around the mean, but variations can be
about Θ(
1√t
), that is too large
I Consider E (nin(t, d)|#G (t) = n) = Ed(T ,N) as a function ofT = t − t0, d ,N = n − n0
I Taking expectation yields a linear recurrent equation forEd(T ,N)
Stage 2 of the proof: solving the equation
I There are two arguments growing linearly with time, T andN. The recurrent equation binds E·(T + 1,N) to E·(T ,N)and E·(T ,N − 1)
I The main term has the form Tpd(NT
), where pd(x) is some
analytical function on [0, 1]. If x = NT+1 , then
Tpd
(N
T
)= Tpd
(x +
x
T
)= Tpd(x) + xp′d(x) +
x2
2Tp′′d(ξ),
Tpd
(N − 1
T
)= Tpd
(x − 1− x
T
)= Tpd(x)− (1− x)p′d(x) +
(1− x)2
2Tp′′d(ξ)
Finalizing the proof
I pd(x) are selected so that the main term satisfies to therecurrent equation ”up to O(d/T )”
I The bound O(1) for the remainder term follows by induction(this requires some amount of calculations)
I This gives Ed(N,T ). It remains to calculateEnin(d , t) = E (Ed(N,T ))
I N has a binomial distribution with parameters T and α + γ
I If f (x) ∈ C 2[0, 1], then
Ef
(N
T
)= f (α + γ) + O
(max0≤x≤1 |f ′′(x)|
T
),
this follows easily from f (N/T ) =f (α+γ) + f ′(α+γ)(N/T −α−γ) + f ′′(ξ)(N/T −α−γ)2/2.
Concentration
I Azuma-Hoeffding inequality yields a bound|nin(d , t)− Enin(d , t)| ≤
√tϕ(t) whp; this is ideal for
constant d , but useless when rd t <√t
I Main idea: En2in(d , t) can be calculated using the same
process as Enin(d , t)
I More precisely, there is a linear recurrence forE (nin(d1, t)nin(d2, t))|#G (t) = n) in terms of the sameexpectation for pairs (d1 − 1, d2) and (d1, d2 − 1)
I Knowing En2in(d , t), we can calculate Dnin(d , t) and apply
Chebyshev’s inequality
Pr(|nin(d , t)− Enin(d , t)| ≥ σ
√Dnin(d , t)
)≤ 1
σ2
I Preliminary calculations show that Dnin(d , t) has the sameorder as Enin(d , t). However, the rigorous proof is yet to bedeveloped
Problem
I Motivation: we want to calculate the expected number ofedges (or probability of an edge) between two given vertices
I Suppose that we know their in- and out-degrees, but nothingelse
I For a fixed vertex, in-degree and out-degree are essentiallyindependent due to the construction (although they both tendto grow with age). Thus, we use only out-degree d1 of thepotential source and in-degree d2 of the potential target
I From the previous part we know nin(d2, t) and nout(d1, t)whp (at least, when d1 and d2 are fixed and t grows). Thus,it is sufficient to calculate total number of edges from somevertex of out-degree d1 to some vertex of in-degree d2
I The problem: let X (t, d1, d2) be the total number of edges inG (t) with the following condition: out-degree of the source isd1, in-degree of the target is d2. Estimate X (t, d1, d2)
Results
TheoremAssume common case: α > 0, β > 0, γ > 0, δin > 0, δout > 0. Letcin = α+β
1+δin(α+γ) , cout = β+γ1+δout(α+γ) , x = α + γ. Then
EX (t, d1, d2) = tcX (d1, d2)(1 + od1,d2(1)), where
cX (x, d1, d2) = c00f (x, d1, d2) + c00b(x, d2, d1) + c01f (x, d1, d2) + c01b(x, d2, d1)
+ c10f (x, d1, d2) + c10b(x, d2, d1) + c11f (x, d1, d2) + c11b(x, d2, d1),
c00f (x, d1, d2) = x2 Γ(d1 + δout )
Γ(d1)Γ(δout + 1)
Γ(d2 + δin)
Γ(d2)Γ(δin + 1)
γ
α + γ
δout
1 + δoutx
1
cincout
×∫∫
0≤v1/cout≤w1/cin≤1vδout+1/cout−1(1− v)d1−1wδin+1/cin−1(1− w)d2−1
×
1− x
1 + δinx
α
α + γδin
∫ 1
w1/cintcin+cout−2dt +
γ
α + γw
cin+cout−1cin
dvdw,
Results
c01f (x, d1, d2) = [d2 ≥ 2]x2 Γ(d1 + δout )
Γ(d1)Γ(δout + 1)
Γ(d2 + δin)
Γ(d2 − 1)Γ(δin + 1)
γ
α + γ
δout
1 + δoutx
1
cincout
×∫∫
0≤v1/cout≤w1/cin≤1vδout+1/cout−1(1− v)d1−1wδin+1/cin (1− w)d2−2
×1− x
1 + δinx
γ
α + γ
∫ 1
w1/cintcin+cout−2dtdvdw,
c10f (x, d1, d2) = [d1 ≥ 2]x2 Γ(d1 + δout )
Γ(d1 − 1)Γ(δout + 1)
Γ(d2 + δin)
Γ(d2)Γ(δin + 1)
α
α + γ
1
1 + δoutx
1
cincout
×∫∫
0≤v1/cout≤w1/cin≤1vδout+1/cout (1− v)d1−2wδin+1/cin−1(1− w)d2−1
×
1− x
1 + δinx
α
α + γδin
∫ 1
w1/cintcin+cout−2dt +
γ
α + γw
cin+cout−1cin
dvdw,
c11f (x, d1, d2) = [d1 ≥ 2, d2 ≥ 2]x2 Γ(d1 + δout )
Γ(d1 − 1)Γ(δout + 1)
Γ(d2 + δin)
Γ(d2 − 1)Γ(δin + 1)
α
α + γ
1
1 + δoutx
1
cincout
×∫∫
0≤v1/cout≤w1/cin≤1vδout+1/cout (1− v)d1−2wδin+1/cin (1− w)d2−2
×1− x
1 + δinx
γ
α + γ
∫ 1
w1/cintcin+cout−2dtdvdw,
c··b is the same as c··f with exchanged properties of incoming and outgoing edges: α↔ γ, δin ↔ δout ,
cin ↔ cout .
Overview
I Key component of the formulas above is∫∫0≤v1/cout≤w1/cin≤1
va1−1(1− v)d1−1wa2−1(1− w)d2−1dvdw
I Formal proof: integrate by parts with respect to v , integrateby parts with respect to w , this gives a recurrent equationbinding values for pairs (d1, d2), (d1 − 1, d2), (d1, d2 − 1).This recurrent equation matches the recurrent equation forEX (t, d1, d2), and the remainder term can be bounded
I Physical sense: if a vertex v is not too close to the beginningand to the end of G (t) and t ′ < t, then
Pr(degin,t(v) = d | degin,t′(v) = d ′)
≈ Γ(d + δin)
Γ(d ′ + δin)Γ(d − d ′ + 1)
(t ′
t
)(d′+δin)cin (1−
(t ′
t
)cin)d−d′
Physical sense
I v1/cout and w1/cin in the formulas for cijk denote two ends of apossible edge
I cijf corresponds to edges oriented forward in time (v1/cout isthe source), cijb - backward in time
I Indices 0 and 1 encode whether one of vertices was created by[new-in] or [new-out] process
I c00f and c10f have two terms inside the integral, one of themcovers the situation when the first edge connects two givenvertices, another covers edges generated by subsequent [old]processes. c01f and c11f have only the latter term
Concentration
TheoremIf ϕ is any function such that ϕ(t)→∞ as t →∞, then|X (t, d1, d2)− EX (t, d1, d2)| ≤ (d1 + d2)
√tϕ(t) whp
I This is just an application of Azuma-Hoeffding inequality