Mean-fieldtheoryoftwo-layersneuralnetworks: dimension...

29
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei, Theodor Misiakiewicz, and Andrea Montanari Stanford University June 26, 2019 COLT 2019 Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 1 / 12

Transcript of Mean-fieldtheoryoftwo-layersneuralnetworks: dimension...

Page 1: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Mean-field theory of two-layers neural networks:dimension-free bounds and kernel limit

Song Mei, Theodor Misiakiewicz, and Andrea Montanari

Stanford University

June 26, 2019

COLT 2019

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 1 / 12

Page 2: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Gradient dynamics of two-layers neural networkI Two layers neural network:

Θ =(θ1; : : : ;θN ); θi = (ai;wi) 2 RD:

y(x;Θ) =1

N

NXi=1

ai�(hwi;xi):

I Risk function:

RN (Θ) = Ex;yh�y �

1

N

NXi=1

ai�(hwi;xi))�2i

:

I SGD/gradient flow:

Θk+1 =Θk � �kr`N (Θk;xk; yk);

d

dtΘt =�rRN (Θ

t):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

Page 3: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Gradient dynamics of two-layers neural networkI Two layers neural network:

Θ =(θ1; : : : ;θN ); θi = (ai;wi) 2 RD:

y(x;Θ) =1

N

NXi=1

ai�(hwi;xi):

I Risk function:

RN (Θ) = Ex;yh�y �

1

N

NXi=1

ai�(hwi;xi))�2i

:

I SGD/gradient flow:

Θk+1 =Θk � �kr`N (Θk;xk; yk);

d

dtΘt =�rRN (Θ

t):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

Page 4: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Gradient dynamics of two-layers neural networkI Two layers neural network:

Θ =(θ1; : : : ;θN ); θi = (ai;wi) 2 RD:

y(x;Θ) =1

N

NXi=1

ai�(hwi;xi):

I Risk function:

RN (Θ) = Ex;yh�y �

1

N

NXi=1

ai�(hwi;xi))�2i

:

I SGD/gradient flow:

Θk+1 =Θk � �kr`N (Θk;xk; yk);

d

dtΘt =�rRN (Θ

t):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

Page 5: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Two-layers neural networks

Hidden layer Output layerInput layer

w1a1

a2

a3

a4

w2

w3

w4

Figure: Architecture for N = 4. θi = (ai;wi)

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 3 / 12

Page 6: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 7: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 8: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 9: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 10: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 11: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Related literatures

I Mean field distributional dynamics:

@t�t(θ) = r � (r(θ; �t)�t):

I Non-linear dynamics. Converges in some cases.I [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

I Neural tangent kernel (NTK) dynamics:

@tkutk22 = �hut;Huti:

I Linear dynamics. Always converges to 0 empirical risk.I [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

Page 12: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

This work

(a) Improved bound for SGD - PDE interpolation.

(b) Relationship of the mean field limit and the kernel limit.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 5 / 12

Page 13: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

SGD and distributional dynamics (DD)

I SGD for Θk, with (xk; yk) � Px;y, i 2 [N ],

θk+1i = θki � 2skNrθi`N (Θk;xk; yk): (SGD)

I [MMN18]: sk = "�(k"), k = t=", N !1, "! 0:

�(N)k �

1

N

NXi=1

�θki) �t 2 P(RD)� [0;1):

I Distributional dynamics (DD) for �t,

@t�t(θ) =2�(t)rθ � (�t(θ)rθ(θ; �t)); (DD)

where(θ; �) =

�R(�)

��(θ)= V (θ) +

ZU(θ;θ0)�(dθ0):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

Page 14: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

SGD and distributional dynamics (DD)

I SGD for Θk, with (xk; yk) � Px;y, i 2 [N ],

θk+1i = θki � 2skNrθi`N (Θk;xk; yk): (SGD)

I [MMN18]: sk = "�(k"), k = t=", N !1, "! 0:

�(N)k �

1

N

NXi=1

�θki) �t 2 P(RD)� [0;1):

I Distributional dynamics (DD) for �t,

@t�t(θ) =2�(t)rθ � (�t(θ)rθ(θ; �t)); (DD)

where(θ; �) =

�R(�)

��(θ)= V (θ) +

ZU(θ;θ0)�(dθ0):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

Page 15: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

SGD and distributional dynamics (DD)

I SGD for Θk, with (xk; yk) � Px;y, i 2 [N ],

θk+1i = θki � 2skNrθi`N (Θk;xk; yk): (SGD)

I [MMN18]: sk = "�(k"), k = t=", N !1, "! 0:

�(N)k �

1

N

NXi=1

�θki) �t 2 P(RD)� [0;1):

I Distributional dynamics (DD) for �t,

@t�t(θ) =2�(t)rθ � (�t(θ)rθ(θ; �t)); (DD)

where(θ; �) =

�R(�)

��(θ)= V (θ) +

ZU(θ;θ0)�(dθ0):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

Page 16: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

An improved boundAssumption

(i) � bounded; (ii) rw�(hx;wi) sub-Gaussian; (iii) r bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let (θ0i )i�N �iid �0. Then, 8f bounded Lipschitz, w.h.p,

supt�T

��� 1N

NXi=1

f(θbt="ci )�

Zf(θ)�t(θ)

��� � Func(T ) �

s1

N_D":

An example: learning a spherically symmetric Lipschitz function usingN = Od(1) neurons and n = Od(d) samples.

Caveat: this improved bound is not strong. In other cases the factorFunc(T ) could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

Page 17: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

An improved boundAssumption

(i) � bounded; (ii) rw�(hx;wi) sub-Gaussian; (iii) r bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let (θ0i )i�N �iid �0. Then, 8f bounded Lipschitz, w.h.p,

supt�T

��� 1N

NXi=1

f(θbt="ci )�

Zf(θ)�t(θ)

��� � Func(T ) �

s1

N_D":

An example: learning a spherically symmetric Lipschitz function usingN = Od(1) neurons and n = Od(d) samples.

Caveat: this improved bound is not strong. In other cases the factorFunc(T ) could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

Page 18: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

An improved boundAssumption

(i) � bounded; (ii) rw�(hx;wi) sub-Gaussian; (iii) r bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let (θ0i )i�N �iid �0. Then, 8f bounded Lipschitz, w.h.p,

supt�T

��� 1N

NXi=1

f(θbt="ci )�

Zf(θ)�t(θ)

��� � Func(T ) �

s1

N_D":

An example: learning a spherically symmetric Lipschitz function usingN = Od(1) neurons and n = Od(d) samples.

Caveat: this improved bound is not strong. In other cases the factorFunc(T ) could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

Page 19: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

An improved boundAssumption

(i) � bounded; (ii) rw�(hx;wi) sub-Gaussian; (iii) r bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let (θ0i )i�N �iid �0. Then, 8f bounded Lipschitz, w.h.p,

supt�T

��� 1N

NXi=1

f(θbt="ci )�

Zf(θ)�t(θ)

��� � Func(T ) �

s1

N_D":

An example: learning a spherically symmetric Lipschitz function usingN = Od(1) neurons and n = Od(d) samples.

Caveat: this improved bound is not strong. In other cases the factorFunc(T ) could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

Page 20: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

This work

(a) Improved bound for SGD - PDE interpolation.

(b) Relationship of the mean field limit and the kernel limit.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 8 / 12

Page 21: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Recovering the kernel limit

Same idea appeared in [Chizat and Bach, 2018b], where the kernellimit was called “lazy training".

Setup:

Prediction function: f�;N (x;θ) =�

N

NXj=1

�?(x;θj);

Risk function: R�;N (θ) =Exh�f(x)� f�;N (x;θ)

�2i;

Gradient flow:dθtjdt

=�N

2�2rθjR�;N (θ

t):

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 9 / 12

Page 22: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The coupled dynamics

Denote ��;Nt = (1=N)PN

j=1 �θtj. Distributional dynamics:

@t��;Nt = (1=�)rθ � (�

�;Nt rθ(θ; �

�;Nt )):

Denote u�;Nt (z) = f(z)� f�;N (z;θt). Residual dynamics:

@tku�;Nt k2L2 = �hu

�;Nt ;H

��;Nt

u�;Nt i:

HereH�(x; z) �

Zhrθ�?(x;θ);rθ�?(z;θ)i�(dθ);

�(θ; ��;N ) =� Ex[u�;Nt (x)�?(x;θ)]:

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

Page 23: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The coupled dynamics

Denote ��;Nt = (1=N)PN

j=1 �θtj. Distributional dynamics:

@t��;Nt = (1=�)rθ � (�

�;Nt rθ(θ; �

�;Nt )):

Denote u�;Nt (z) = f(z)� f�;N (z;θt). Residual dynamics:

@tku�;Nt k2L2 = �hu

�;Nt ;H

��;Nt

u�;Nt i:

HereH�(x; z) �

Zhrθ�?(x;θ);rθ�?(z;θ)i�(dθ);

�(θ; ��;N ) =� Ex[u�;Nt (x)�?(x;θ)]:

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

Page 24: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The coupled dynamics

Denote ��;Nt = (1=N)PN

j=1 �θtj. Distributional dynamics:

@t��;Nt = (1=�)rθ � (�

�;Nt rθ(θ; �

�;Nt )):

Denote u�;Nt (z) = f(z)� f�;N (z;θt). Residual dynamics:

@tku�;Nt k2L2 = �hu

�;Nt ;H

��;Nt

u�;Nt i:

HereH�(x; z) �

Zhrθ�?(x;θ);rθ�?(z;θ)i�(dθ);

�(θ; ��;N ) =� Ex[u�;Nt (x)�?(x;θ)]:

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

Page 25: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The mean field limit and kernel limit

@t��;Nt =(1=�)rθ � (�

�;Nt [rθ(θ; �

�;Nt )]);

@tku�;Nt k2L2 =� hu

�;Nt ;H

��;Nt

u�;Nt i:

I The mean field limit: fix � = O(1) and let N !1.I The kernel limit: let �!1 after N !1.I The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk willconverge to 0. Full derivation see appendix H of [Mei,Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

Page 26: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The mean field limit and kernel limit

@t��;Nt =(1=�)rθ � (�

�;Nt [rθ(θ; �

�;Nt )]);

@tku�;Nt k2L2 =� hu

�;Nt ;H

��;Nt

u�;Nt i:

I The mean field limit: fix � = O(1) and let N !1.I The kernel limit: let �!1 after N !1.I The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk willconverge to 0. Full derivation see appendix H of [Mei,Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

Page 27: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

The mean field limit and kernel limit

@t��;Nt =(1=�)rθ � (�

�;Nt [rθ(θ; �

�;Nt )]);

@tku�;Nt k2L2 =� hu

�;Nt ;H

��;Nt

u�;Nt i:

I The mean field limit: fix � = O(1) and let N !1.I The kernel limit: let �!1 after N !1.I The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk willconverge to 0. Full derivation see appendix H of [Mei,Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

Page 28: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Summary

I Gave an interpretation of neural tangent kernel in the mean fieldpoint of view. (Also in [Chizat and Bach, 2018b])

I Whether NTK can explain the success of neural network is still anopen problem. [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019],[Ghorbani, Mei, Misiakiwics, and Montanari, 2019a, 2019b],[Allen-Zhu and Li, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 12 / 12

Page 29: Mean-fieldtheoryoftwo-layersneuralnetworks: dimension ...songmei/Presentation/...Mean-fieldtheoryoftwo-layersneuralnetworks: dimension-freeboundsandkernellimit SongMei,TheodorMisiakiewicz,andAndreaMontanari

Summary

I Gave an interpretation of neural tangent kernel in the mean fieldpoint of view. (Also in [Chizat and Bach, 2018b])

I Whether NTK can explain the success of neural network is still anopen problem. [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019],[Ghorbani, Mei, Misiakiwics, and Montanari, 2019a, 2019b],[Allen-Zhu and Li, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 12 / 12