New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio...
-
Upload
nichole-freebern -
Category
Documents
-
view
218 -
download
0
Transcript of New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio...
New degree bounds for polynomials with prescribed
signs
Ryan O’Donnell (MIT)
Rocco Servedio (Harvard/Columbia)
Polynomials with prescribed signs
Suppose m disjoint regions R1, …, Rm are given in Rn, along with associated signs, σ1, …, σm. What is the lowest degree polynomial p : Rn→R which has the prescribed signs on the regions?
In one dimension the problem is trivial: if the regions are intervals, the number of sign alternations is necessary and sufficient.
In two or more dimensions…??
Polynomial threshold functions
A special case:Let f :{0,1}n→{+1,−1} be a boolean function.Let p : Rn→R be a polynomial.We say that p is a polynomial threshold
function (PTF) for f, or p sign-represents f, if:
f(x) = sgn(p(x)) for all x{0,1}n.
We are concerned with finding the lowest degree PTF for f.
Polynomial threshold functions
For example:
• x1+x2+…+xn− ½ deg 1 PTF for OR
• x1+x2+…+xn− (n−½) deg 1 PTF for AND
• x1+x2+…+xn− (n/2) deg 1 PTF for MAJ
• (1−2x1) (1−2x2)···(1−2xn) deg n for PARITY
Every n-bit boolean function has a PTF (indeed, an exact rep.) of degree ≤ n.
(Consider: … + f (1101) x1x2(x3−1)x4 + …)
Polynomial threshold functions
What are PTFs good for?
• natural algebraic model of complexity
• upper bounds machine learning: given a class of functions C, if every function has a PTF of degree d, can learn C in time nO(d)
• used to prove PP closed under intersection
• lower bounds oracle separations
• slightly stricter model related to quantum decision tree complexity
Prior work — lower bounds
Minsky & Papert, Perceptrons, 1968:
• artificial intelligence perspective
• proved three major lower bounds:- PARITY requires PTF degree n- a certain DNF formula, “one in a box”,
the n1/3 way OR of n2/3 way ANDs, requires PTF degree n1/3
- MAJ(x1,…,xn) AND MAJ(y1,…,yn) requires superconstant PTF degree
No new, essentially diff., lower bounds known.
Prior work — upper bounds
• [BRS95] considered AND-MAJn as well; they showed it has PTF degree O(log n); they used this to show PP is closed under intersection
• [KS01] showed that every DNF formula on n variables with s terms has a PTF of degree O(n1/3 log s); they use this to get a subexponential time learning algorithm for DNF formulas which is fastest known
Our resultsUpper bound: every boolean function given
by an AND/OR/NOT formula of size s and depth d has a PTF of degree √s logO(d)s (note that degree s is trivial) gives a subexponential time learning
algorithm for, say, linear sizeformulas of superconstant depth, firstsuch known
Lower bound: new technique AND-MAJn requires PTF degree Ω(log n / log log n).
Talk outline
Plan for the talk:
1. Prove √s logO(d)s PTF upper bound for formulas.
2. Prove Ω(log n / log log n) PTF lowerbound for AND-MAJn.
Boolean formulas
AND
ANDANDANDAND
OR
OROROR
OROR
OR
x1 x2 x3 x4 x7x2
x5x1
x8 x9 x10 x11 x12 x1 x4x6 x7 x13
• a formula is a tree whose gates are ANDs or ORs, unbounded fan in
• leaves are labeled with literals
• size is number of leaves
• depth is longest root to leaf path
PTFs for boolean formulas(In this section we use {0,1} always.)
Idea: replace all gates with low degree polynomials which simulate the gate:
AND(v1,…vk)?
• v1 + … + vk − (k−1)
• [(v1+ … + vk) / k]k log(1/ε)
AND
11 001
0
A better amplifying polynomial
We want to amplify the disparity between 1−1/ k and 1. Raising to the power of k works, but costs a lot of degree.
We desire a polynomial of low degree which keeps values in [0, 1−1/ k] between 0 and 1 but amplifies the point 1 to, say, 2.
Equivalently, want to get a polynomial bounded on [0,1], with maximum derivative at 1.
Chebyshev polynomials
This is an old problem of analysis, solved by the Chebyshevpolynomials of the first kind.
These are a family of orthogonalpolynomials, (Cr)rN, with theproperties: deg(Cr) = r, Cr([-1,1]) [-1,1] , Cr' (1) = r2, Cr(1+1/r2) 2.
Cr(x) = cos(r acos(x)).
Chebyshev polynomials at gates
Chebyshev polynomials give us a square-root degree savings:
Imagine replacing AND(v1,…vk) with:
C√k ([(v1+ … + vk) / (k-1)]). (*)
(v1+ … + vk) / (k-1) 1+1/k if all vi’s are roughly 1, and is in [0,1] otherwise.
Hence (*) is something like 2 when the AND is
true, and is between -1 and 1 otherwise.
(This idea is originally from [KS01].)
Chebyshev polynomials at gates
In fact, we will replace each AND gate by:
ε C√k ([(v1+ … + vk) / (k-1)]) log(1/ε),
and something similar for OR gates. Note that if the inputs have 0/1 values ε, so do the outputs.
Further, if the vi’s all have degree bounded by d, the resulting polynomial has degree bounded by d √k log(1/ε).
Almost done
By applying these polynomials at every gate, we can easily conclude:
Suppose F is a formula in which along every path from root to leaf, the product of the fan-ins is t. Then we can sign-represent F with a polynomial of degree √t logO(d)s. (Need to take ε 1/s.)
We are not quite done, because these fan-in products can be huge!
Bounding fan-in products
OR
AND
AND
OR
…
x1 … xn/100
xn/100 … x2n/100
…
…
Only n variables (leaves) are used, but one path has fan-in product (n/100)100.
Solution: bucket
The trick is now to partition each gate into gates, each of which has subformulas of similar size:
AND
s1 s2 s3 s4
AND
AND AND AND
1 ≤ si < 2 2j ≤ si < 2j+1 s/2 ≤ si < s
log s
Conclusion of upper bound
Now it is easy to see that gates with a subformula of depth d and size s have maximum root-to-leaf fan-in product of O(s logds):
Pf: By induction: the AND bucket with subsizes in [2j, 2j+1] has fan-in at most s/2j.
Hence if we first modify our formulas in this way, and then apply the Chebyshev construction, we get PTFs of degree √s logO(d)s, as desired.
Talk outline
Plan for the talk:
1. Prove √s logO(d)s PTF upper bound for formulas.
2. Prove Ω(log n / log log n) PTF lowerbound for AND-MAJn.
Lower bound for AND-MAJn
Recall the AND-MAJn function:
(x1,…,xn, y1,…,yn) ↦ MAJ(x1,…,xn) AND MAJ(y1,…,yn).
Minsky and Papert (1968) showed that any PTF required superconstant ω(1) degree.
Beigel, Reingold, and Spielman (1995) exhibited a PTF of degree O(log n).
We give a new lower bound of:
Ω(log n / log log n).
The two-dimensional problem
Minsky and Papert observed that the problem of PTFs for AND-MAJn is equivalent to a much simpler polynomial sign prescription problem – the M-intersector problem:
- R2, bivariate polynomial- regions: all odd lattice
points bounded by M- upper right points
positive, others negative
y
x
M
Proof of equivalence
Switch to {+1,−1} in input and output.
() Suppose p is an n-intersector. Then p(∑xi, ∑yi) is a PTF for AND-MAJn of same degree.
() Suppose p is the PTF. Consider:
q(x1…xn, y1…yn) = ∑ p(xπ(1)…xπ(n), yπ'(1)…yπ'(n)).
By symmetry, q is also a PTF for AND-MAJn. But q is symmetric in x’s and y’s, hence depends only on their sum, q=q(∑xi, ∑yi).
π,π'Sn
The M-intersector problemConsider the more general sign prescription
problem:
No polynomial can havethese signs!
Proof: Assume we havep of minimal degree. By continuity, p must be 0 on x half-axis. By Bezout, x | p. Divide through; the result has smaller degree, solves (essentially) same problem.
y
x
+
−−
−
Reproving Minsky-Papert
This can be used to show Minsky and Papert’s superconstant lower bound. Suppose there was a fixed d such that there was a M-intersector of degree d for every M. Take M → ∞, rescaling to the unit square. By compactness and continuity, there is a limiting degree-d polynomial whose signs are as on the previous slide, a contradiction.
The relaxed case
[BRS95] constructed abivariate polynomial ofdegree O(log M) for thesign pattern shown.
We now describe howto obtain a lower boundof Ω(log M / log log M) for the M-intersector problem. We show that for any d, there is a subset of lattice points with coordinates at most dO(d) which can’t be done in degree d.
y
x
+
−−
−
1
M
A constructive solutionIt is possible to show PTF lower bounds
constructively.
Let Z denote the set of oddlattice points, and let f denotethe function which is +1 inthe upper-right quadrant, −1 elsewhere.
Suppose we could find a probability distribution w on Z under which every monomial xiyj, 0 ≤ i+j ≤ d, had zero correlation with f.
A constructive solution
I.e., suppose we have w : Z → R≥0, ∑w(z) = 1, such that:
∑ f(x,y) xiyj w(x,y) = 0
for all monomials xiyj of degree at most d.
Suppose also that w = 0 on points with coordinates exceeding M.
We claim this implies no M-intersector of degree d exists.
z Z
(x,y) Z
Proof of constructive method
Proof: Suppose p were an M-intersector of degree d. On one hand, by linearity of expectation, Ew[f(x,y)p(x,y)] = 0, since f is uncorrelated with monomials of degree ≤ d.
On the other hand, on all lattice points bounded by M, f(x,y)p(x,y) > 0. But w gives all of its probability mass to these points.
Intriguingly, the much stronger converse (no distribution PTF) is true, by LP duality.
Constructing the distribution
There are D = (d+1)(d+2)/2 constraints – monomials we want to be uncorrelated with.
Suppose we pick just D+1 points for our distribution to be supported on, (x1,y1), …, (xD+1,yD+1).
Then the condition that w is a probability distribution over these points under which all constraint monomials have 0 correlation with f is a (D+1)×(D+1) linear system.
Constructing the distributionm
onom
ial
xi yj
point (xk,yk)
f(xk,yk) xki yk
j
1 1 1 1 1 · · · · · 1 1 1
=
1000 :
:
0
w(x1,y1)
w(x2,y2)
w(x3,y3)
w(xD+1, yD+1 )
Our desire is that the solution be nonnegative.
Me thinking
Rocco thinking
Our solution
We now pull a rabbit out of our hat and name the exact set of points on which the distribution will be supported. Essentially, we want just the grid of points, but in the log scale.
Let h be a large number to benamed later. Our points willbe a subset of { (hi, hj) : 0 ≤ i+j ≤ d}.
Our solution
The exact (D+1) points to consider are:
{﴾(−1)l hk, (−1)k hl﴿ : 0 ≤ k+l ≤ d} {﴾−1,−1﴿},
where h = dO(1), and odd.
Finishing the proof
We consider the linear system given by this choice of points. We need to show the solution consists of nonnegative values. The solution weights are ratios of two certain determinants, by Cramer’s rule. Each determinant is a polynomial in h. We calculate the highest order terms, show that they dominate the polynomial (using the fact that h is large), and show they have the same sign. (Details omitted!)
Finishing the proof
Hence, we’ve constructed a true probability distribution over the odd lattice points, under which f has zero correlation with all monomials of degree at most d.
The largest coordinate used is dO(d).
This shows that dO(d)-intersectors require PTF degree d; i.e., M-intersectors require PTF degree Ω(log M / log log M).
Talk outline
Plan for the talk:
1. Prove √s logO(d)s PTF upper bound for formulas.
2. Prove Ω(log n / log log n) PTF lowerbound for AND-MAJn.
Open questions
• Does every boolean formula of size s have a PTF of degree O(√s) independent of depth?
• Minsky and Papert showed a Ω(n1/3) PTF lower bound for a certain depth 2 circuit. Can one show a significantly stronger lower bound for any constant depth circuit?
• Better lower or upper bounds for the intersection of two weighted thresholds?
• Explore the polynomial sign prescription problem further.