New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio...

New degree bounds for polynomials with prescribed

signs

Ryan O’Donnell (MIT)

Rocco Servedio (Harvard/Columbia)

Polynomials with prescribed signs

Suppose m disjoint regions R1, …, Rm are given in Rn, along with associated signs, σ1, …, σm. What is the lowest degree polynomial p : Rn→R which has the prescribed signs on the regions?

In one dimension the problem is trivial: if the regions are intervals, the number of sign alternations is necessary and sufficient.

In two or more dimensions…??

Polynomial threshold functions

A special case:Let f :{0,1}n→{+1,−1} be a boolean function.Let p : Rn→R be a polynomial.We say that p is a polynomial threshold

function (PTF) for f, or p sign-represents f, if:

f(x) = sgn(p(x)) for all x{0,1}n.

We are concerned with finding the lowest degree PTF for f.


For example:

• x1+x2+…+xn− ½ deg 1 PTF for OR

• x1+x2+…+xn− (n−½) deg 1 PTF for AND

• x1+x2+…+xn− (n/2) deg 1 PTF for MAJ

• (1−2x1) (1−2x2)···(1−2xn) deg n for PARITY

Every n-bit boolean function has a PTF (indeed, an exact rep.) of degree ≤ n.

(Consider: … + f (1101) x1x2(x3−1)x4 + …)


What are PTFs good for?

• natural algebraic model of complexity

• upper bounds machine learning: given a class of functions C, if every function has a PTF of degree d, can learn C in time nO(d)

• used to prove PP closed under intersection

• lower bounds oracle separations

• slightly stricter model related to quantum decision tree complexity

Prior work — lower bounds

Minsky & Papert, Perceptrons, 1968:

• artificial intelligence perspective

• proved three major lower bounds:- PARITY requires PTF degree n- a certain DNF formula, “one in a box”,

the n1/3 way OR of n2/3 way ANDs, requires PTF degree n1/3

- MAJ(x1,…,xn) AND MAJ(y1,…,yn) requires superconstant PTF degree

No new, essentially diff., lower bounds known.

Prior work — upper bounds

• [BRS95] considered AND-MAJn as well; they showed it has PTF degree O(log n); they used this to show PP is closed under intersection

• [KS01] showed that every DNF formula on n variables with s terms has a PTF of degree O(n1/3 log s); they use this to get a subexponential time learning algorithm for DNF formulas which is fastest known

Our resultsUpper bound: every boolean function given

by an AND/OR/NOT formula of size s and depth d has a PTF of degree √s logO(d)s (note that degree s is trivial) gives a subexponential time learning

algorithm for, say, linear sizeformulas of superconstant depth, firstsuch known

Lower bound: new technique AND-MAJn requires PTF degree Ω(log n / log log n).

Talk outline

Plan for the talk:

1. Prove √s logO(d)s PTF upper bound for formulas.

2. Prove Ω(log n / log log n) PTF lowerbound for AND-MAJn.

Boolean formulas

AND

ANDANDANDAND

OR

OROROR

OROR

OR

x1 x2 x3 x4 x7x2

x5x1

x8 x9 x10 x11 x12 x1 x4x6 x7 x13

• a formula is a tree whose gates are ANDs or ORs, unbounded fan in

• leaves are labeled with literals

• size is number of leaves

• depth is longest root to leaf path

PTFs for boolean formulas(In this section we use {0,1} always.)

Idea: replace all gates with low degree polynomials which simulate the gate:

AND(v1,…vk)?

• v1 + … + vk − (k−1)

• [(v1+ … + vk) / k]k log(1/ε)

AND

11 001

0

A better amplifying polynomial

We want to amplify the disparity between 1−1/ k and 1. Raising to the power of k works, but costs a lot of degree.

We desire a polynomial of low degree which keeps values in [0, 1−1/ k] between 0 and 1 but amplifies the point 1 to, say, 2.

Equivalently, want to get a polynomial bounded on [0,1], with maximum derivative at 1.

Chebyshev polynomials

This is an old problem of analysis, solved by the Chebyshevpolynomials of the first kind.

These are a family of orthogonalpolynomials, (Cr)rN, with theproperties: deg(Cr) = r, Cr([-1,1]) [-1,1] , Cr' (1) = r2, Cr(1+1/r2) 2.

Cr(x) = cos(r acos(x)).

Chebyshev polynomials at gates

Chebyshev polynomials give us a square-root degree savings:

Imagine replacing AND(v1,…vk) with:

C√k ([(v1+ … + vk) / (k-1)]). (*)

(v1+ … + vk) / (k-1) 1+1/k if all vi’s are roughly 1, and is in [0,1] otherwise.

Hence (*) is something like 2 when the AND is

true, and is between -1 and 1 otherwise.

(This idea is originally from [KS01].)

Chebyshev polynomials at gates

In fact, we will replace each AND gate by:

ε C√k ([(v1+ … + vk) / (k-1)]) log(1/ε),

and something similar for OR gates. Note that if the inputs have 0/1 values ε, so do the outputs.

Further, if the vi’s all have degree bounded by d, the resulting polynomial has degree bounded by d √k log(1/ε).

Almost done

By applying these polynomials at every gate, we can easily conclude:

Suppose F is a formula in which along every path from root to leaf, the product of the fan-ins is t. Then we can sign-represent F with a polynomial of degree √t logO(d)s. (Need to take ε 1/s.)

We are not quite done, because these fan-in products can be huge!

Bounding fan-in products

OR

AND

AND

OR

…

x1 … xn/100

xn/100 … x2n/100

…

…

Only n variables (leaves) are used, but one path has fan-in product (n/100)100.

Solution: bucket

The trick is now to partition each gate into gates, each of which has subformulas of similar size:

AND

s1 s2 s3 s4

AND

AND AND AND

1 ≤ si < 2 2j ≤ si < 2j+1 s/2 ≤ si < s

log s

Conclusion of upper bound

Now it is easy to see that gates with a subformula of depth d and size s have maximum root-to-leaf fan-in product of O(s logds):

Pf: By induction: the AND bucket with subsizes in [2j, 2j+1] has fan-in at most s/2j.

Hence if we first modify our formulas in this way, and then apply the Chebyshev construction, we get PTFs of degree √s logO(d)s, as desired.

Talk outline

Plan for the talk:



Lower bound for AND-MAJn

Recall the AND-MAJn function:

(x1,…,xn, y1,…,yn) ↦ MAJ(x1,…,xn) AND MAJ(y1,…,yn).

Minsky and Papert (1968) showed that any PTF required superconstant ω(1) degree.

Beigel, Reingold, and Spielman (1995) exhibited a PTF of degree O(log n).

We give a new lower bound of:

Ω(log n / log log n).

The two-dimensional problem

Minsky and Papert observed that the problem of PTFs for AND-MAJn is equivalent to a much simpler polynomial sign prescription problem – the M-intersector problem:

- R2, bivariate polynomial- regions: all odd lattice

points bounded by M- upper right points

positive, others negative

y

x

M

Proof of equivalence

Switch to {+1,−1} in input and output.

() Suppose p is an n-intersector. Then p(∑xi, ∑yi) is a PTF for AND-MAJn of same degree.

() Suppose p is the PTF. Consider:

q(x1…xn, y1…yn) = ∑ p(xπ(1)…xπ(n), yπ'(1)…yπ'(n)).

By symmetry, q is also a PTF for AND-MAJn. But q is symmetric in x’s and y’s, hence depends only on their sum, q=q(∑xi, ∑yi).

π,π'Sn

The M-intersector problemConsider the more general sign prescription

problem:

No polynomial can havethese signs!

Proof: Assume we havep of minimal degree. By continuity, p must be 0 on x half-axis. By Bezout, x | p. Divide through; the result has smaller degree, solves (essentially) same problem.

y

x

+

−−

−

Reproving Minsky-Papert

This can be used to show Minsky and Papert’s superconstant lower bound. Suppose there was a fixed d such that there was a M-intersector of degree d for every M. Take M → ∞, rescaling to the unit square. By compactness and continuity, there is a limiting degree-d polynomial whose signs are as on the previous slide, a contradiction.

The relaxed case

[BRS95] constructed abivariate polynomial ofdegree O(log M) for thesign pattern shown.

We now describe howto obtain a lower boundof Ω(log M / log log M) for the M-intersector problem. We show that for any d, there is a subset of lattice points with coordinates at most dO(d) which can’t be done in degree d.

y

x

+

−−

−

1

M

A constructive solutionIt is possible to show PTF lower bounds

constructively.

Let Z denote the set of oddlattice points, and let f denotethe function which is +1 inthe upper-right quadrant, −1 elsewhere.

Suppose we could find a probability distribution w on Z under which every monomial xiyj, 0 ≤ i+j ≤ d, had zero correlation with f.

A constructive solution

I.e., suppose we have w : Z → R≥0, ∑w(z) = 1, such that:

∑ f(x,y) xiyj w(x,y) = 0

for all monomials xiyj of degree at most d.

Suppose also that w = 0 on points with coordinates exceeding M.

We claim this implies no M-intersector of degree d exists.

z Z

(x,y) Z

Proof of constructive method

Proof: Suppose p were an M-intersector of degree d. On one hand, by linearity of expectation, Ew[f(x,y)p(x,y)] = 0, since f is uncorrelated with monomials of degree ≤ d.

On the other hand, on all lattice points bounded by M, f(x,y)p(x,y) > 0. But w gives all of its probability mass to these points.

Intriguingly, the much stronger converse (no distribution PTF) is true, by LP duality.

Constructing the distribution

There are D = (d+1)(d+2)/2 constraints – monomials we want to be uncorrelated with.

Suppose we pick just D+1 points for our distribution to be supported on, (x1,y1), …, (xD+1,yD+1).

Then the condition that w is a probability distribution over these points under which all constraint monomials have 0 correlation with f is a (D+1)×(D+1) linear system.

Constructing the distributionm

onom

ial

xi yj

point (xk,yk)

f(xk,yk) xki yk

j

1 1 1 1 1 · · · · · 1 1 1

=

1000 :

:

0

w(x1,y1)

w(x2,y2)

w(x3,y3)

w(xD+1, yD+1 )

Our desire is that the solution be nonnegative.

Me thinking

Rocco thinking

Our solution

We now pull a rabbit out of our hat and name the exact set of points on which the distribution will be supported. Essentially, we want just the grid of points, but in the log scale.

Let h be a large number to benamed later. Our points willbe a subset of { (hi, hj) : 0 ≤ i+j ≤ d}.

Our solution

The exact (D+1) points to consider are:

{﴾(−1)l hk, (−1)k hl﴿ : 0 ≤ k+l ≤ d} {﴾−1,−1﴿},

where h = dO(1), and odd.

Finishing the proof

We consider the linear system given by this choice of points. We need to show the solution consists of nonnegative values. The solution weights are ratios of two certain determinants, by Cramer’s rule. Each determinant is a polynomial in h. We calculate the highest order terms, show that they dominate the polynomial (using the fact that h is large), and show they have the same sign. (Details omitted!)

Finishing the proof

Hence, we’ve constructed a true probability distribution over the odd lattice points, under which f has zero correlation with all monomials of degree at most d.

The largest coordinate used is dO(d).

This shows that dO(d)-intersectors require PTF degree d; i.e., M-intersectors require PTF degree Ω(log M / log log M).

Talk outline

Plan for the talk:



Open questions

• Does every boolean formula of size s have a PTF of degree O(√s) independent of depth?

• Minsky and Papert showed a Ω(n1/3) PTF lower bound for a certain depth 2 circuit. Can one show a significantly stronger lower bound for any constant depth circuit?

• Better lower or upper bounds for the intersection of two weighted thresholds?

• Explore the polynomial sign prescription problem further.

New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio...

Documents

Transcript of New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio...