Representation Power of Feedforward Neural Networks

• Representation Power of Feedforward NeuralNetworks

Based on work by Barron (1993), Cybenko (1989),Kolmogorov (1957)

Matus Telgarsky

• Feedforward Neural Networks

I Two node types:I Linear combinations:

x 7i

wixi + w0.

I Sigmoid thresholded linear combinations:

x 7 (w, x+ w0) .

I What can a network of these nodes represent?ni=1

wixi one layer,

ni=1

wi

nij=1

wjixj + wj0

two layers,...

...

I Target set [0, 3]; target function 1[x [1, 2]].

I Standard sigmoid s(x) := 1/(1 + ex).

I Consider sigmoid output at x = 2.

I w 0 and (2w + w0) 1/2: mess up on right side.

0 3

1

0.5

I w 0 and (2w + w0) < 1/2: mess up on middle bump.

0 3

1

0.5

I Can symmetrize (w < 0); no matter what, error 1/2.

• Meaning of Universal Approximation

Target set [0, 1]n; target function f C([0, 1]n).

I For any > 0, exists NN f ,

x [0, 1]n = |f(x) f(x)| < .

I This gives NNs fi f pointwise.

I For any > 0, exists NN f and S [0, 1]n, m(S) 1 ,

x S = |f(x) f(x)| < .

I If (for instance) bounded on Sc, gives NNs fi f m-a.e..

Goal: 2-NNs approximate continuous functions over [0, 1]n.

• Outline

I 2-nn via functional analysis (Cybenko, 1989).

I 2-nn via greedy approx (Barron, 1993).

I 3-nn via histograms (Folklore).

I 3-nn via wizardry (Kolmogorov, 1957).

• Overview of Functional Analysis proof (Cybenko, 1989)

I Hidden layer as a basis:

B := {(w, x+ w0) : w Rn, w0 R} .

I Want to show cl(span(B)) = C([0, 1]n).I Work via contradiction: if f C([0, 1]n) far from

cl(span(B)), can bridge the gap with a sigmoid.

• Proof of Cybenko (1989)

I Consider the

closed

subspace

S :=

cl

(

span ({(w, x+ w0) : w Rn, w0 R})

)

.

I Suppose (contradictorily) exists f C([0, 1]n) \ S.

I Exists bounded linear L 6= 0 with L|S = 0.I Exists 6= 0 so that L(h) =

h(x)d(x).

I L|S = 0 implies w,w0 (w, x+ w0)d(x) = 0.

I But discriminatory means = 0 w,w0 . . ..

