[IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA...

9
On derandomizing probabilistic sublinear-time algorithms Marius Zimand Towson University Department of Computer and Information Sciences Baltimore, MD http://triton.towson.edu/˜mzimand Abstract There exists a positive constant α< 1 such that for any function T (n) n α and for any problem L BPTIME(T (n)), there exists a deterministic algorithm running in poly(T (n)) time which decides L, except for at most a 2 Ω(T (n) log T (n)) fraction of inputs of length n. 1 Introduction Several important derandomization results have been ob- tained in the last decade showing, under some plausible as- sumptions, that certain types of probabilistic algorithms can be simulated deterministically without a large time over- head. The most significant such result is due to Impagli- azzo and Wigderson [IW97]: P = BPP, unless, for some constant α> 0, every problem in E can be solved by cir- cuits of size 2 αn . Under a uniform hardness assumption, a partial derandomization of BPP is known [IW98]: un- less EXP = BPP, any problem in BPP can be solved in deterministic subexponential time at infinitely many input lengths, where at those lengths, the algorithm is correct on all inputs with the exception of a negligible fraction. De- randomization results in the same vein for the classes AM and MA have been shown as well (for surveys see the pa- pers of Impagliazzo [Imp03], Kabanets [Kab02] and Mil- tersen [Mil01]). Even though the assumptions of the above results are very reasonable and likely to be true, it turns out that they are very difficult to prove. Kabanets and Im- pagliazzo [KI04] have shown that P = BPP implies certain circuit lower bounds which are notoriously hard to demon- strate. On the other hand, for weaker computational models, un- conditional derandomization results do exist. A classical result of Rabin [Rab63] states that probabilistic finite state automata accept regular languages. There are several other The author is supported in part by NSF grant CCF 0634830. types of automata that are equivalent to their determinis- tic counterpart (see the survey paper of Condon [Con01]). Nisan [Nis92] has shown that probabilistic computation us- ing little space can also be derandomized (for a discus- sion of this and related results see the survey paper of Saks [Sak96]). A relatively weak model of computation is the object of our investigation as well. More precisely, in this paper we consider the class of probabilistic sublinear time algo- rithms with bounded error and we show an unconditional derandomization result for this class. The model of ran- dom access machines that can read only a small part of their input tape is apparently very restricted. However, if the use of randomness is allowed, then different probabilis- tic branches of the computation tree of such machines can cover the entire input and interesting possibilities emerge. A line of research, which has been very active recently, has established that such machines can solve significant prob- lems, such as, to list just a few, matrix product approx- imation [DK01, DKM04], estimating the number of con- nected components in a graph, estimating the weight of the minimum spanning tree in a graph [CRT01, CEF + 05]. Moreover, probabilistic sublinear-time algorithms are at the core of property testing, a theory which has seen impressive progress in the last years. Some learning algorithms can also be viewed as probabilistic sublinear-time algorithms. The main result of this paper is that there exists a con- stant α< 1 such that any decision problem solvable by a bounded error probabilistic algorithm running in time T (n) n α can also be solved by a deterministic algorithm that runs in poly(T (n)) time and is correct for at least a fraction of (1 2 Ω(T (n) log T (n)) ) of all inputs of length n, for all lengths n. For example, if the probabilistic algorithm runs in time polylog(n), then the deterministic algorithm runs still in polylog time and the simulation is correct on all inputs except a negligible fraction. Moreover, the deter- ministic algorithm can be effectively constructed from the probabilistic algorithm. 1 1 The non-uniform version of this result is trivial. Since for any input the Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07) 0-7695-2780-9/07 $20.00 © 2007

Transcript of [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA...

Page 1: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

On derandomizing probabilistic sublinear-time algorithms

Marius Zimand∗

Towson UniversityDepartment of Computer and Information Sciences

Baltimore, MDhttp://triton.towson.edu/˜mzimand

Abstract

There exists a positive constant α < 1 such that forany function T (n) ≤ nα and for any problem L ∈BPTIME(T (n)), there exists a deterministic algorithmrunning in poly(T (n)) time which decides L, except for atmost a 2−Ω(T (n) log T (n)) fraction of inputs of length n.

1 Introduction

Several important derandomization results have been ob-tained in the last decade showing, under some plausible as-sumptions, that certain types of probabilistic algorithms canbe simulated deterministically without a large time over-head. The most significant such result is due to Impagli-azzo and Wigderson [IW97]: P = BPP, unless, for someconstant α > 0, every problem in E can be solved by cir-cuits of size 2αn. Under a uniform hardness assumption,a partial derandomization of BPP is known [IW98]: un-less EXP = BPP, any problem in BPP can be solved indeterministic subexponential time at infinitely many inputlengths, where at those lengths, the algorithm is correct onall inputs with the exception of a negligible fraction. De-randomization results in the same vein for the classes AMand MA have been shown as well (for surveys see the pa-pers of Impagliazzo [Imp03], Kabanets [Kab02] and Mil-tersen [Mil01]). Even though the assumptions of the aboveresults are very reasonable and likely to be true, it turnsout that they are very difficult to prove. Kabanets and Im-pagliazzo [KI04] have shown that P = BPP implies certaincircuit lower bounds which are notoriously hard to demon-strate.

On the other hand, for weaker computational models, un-conditional derandomization results do exist. A classicalresult of Rabin [Rab63] states that probabilistic finite stateautomata accept regular languages. There are several other

∗The author is supported in part by NSF grant CCF 0634830.

types of automata that are equivalent to their determinis-tic counterpart (see the survey paper of Condon [Con01]).Nisan [Nis92] has shown that probabilistic computation us-ing little space can also be derandomized (for a discus-sion of this and related results see the survey paper ofSaks [Sak96]).

A relatively weak model of computation is the objectof our investigation as well. More precisely, in this paperwe consider the class of probabilistic sublinear time algo-rithms with bounded error and we show an unconditionalderandomization result for this class. The model of ran-dom access machines that can read only a small part oftheir input tape is apparently very restricted. However, ifthe use of randomness is allowed, then different probabilis-tic branches of the computation tree of such machines cancover the entire input and interesting possibilities emerge.A line of research, which has been very active recently, hasestablished that such machines can solve significant prob-lems, such as, to list just a few, matrix product approx-imation [DK01, DKM04], estimating the number of con-nected components in a graph, estimating the weight ofthe minimum spanning tree in a graph [CRT01, CEF+05].Moreover, probabilistic sublinear-time algorithms are at thecore of property testing, a theory which has seen impressiveprogress in the last years. Some learning algorithms canalso be viewed as probabilistic sublinear-time algorithms.

The main result of this paper is that there exists a con-stant α < 1 such that any decision problem solvable bya bounded error probabilistic algorithm running in timeT (n) ≤ nα can also be solved by a deterministic algorithmthat runs in poly(T (n)) time and is correct for at least afraction of (1− 2−Ω(T (n) log T (n))) of all inputs of length n,for all lengths n. For example, if the probabilistic algorithmruns in time polylog(n), then the deterministic algorithmruns still in polylog time and the simulation is correct onall inputs except a negligible fraction. Moreover, the deter-ministic algorithm can be effectively constructed from theprobabilistic algorithm.1

1The non-uniform version of this result is trivial. Since for any input the

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 2: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

The result applies also to promise problems and thus it isrelevant for property testing. Indeed, with the possible ex-ception of a 2−Ω(T (n) log T (n)) fraction of inputs x of lengthn, for all inputs x for which the probabilistic algorithm iscorrect with probability (1/2) + ε, the deterministic algo-rithm is correct as well.2

The result is probably of little practical interest, becausein practice one in general prefers algorithms that work forall inputs with some small error probability to algorithmsthat work for almost all inputs even if for these inputs thereis no error. Theoretically, the result is interesting because itshows that the properties that can be checked in sublineartime depend, except for a few inputs, on just a few bits ofthe input and the locations of these bits can be found veryfast. It also adds a new entry to the short list of classes ofprobabilistic computation that can be derandomized uncon-ditionally.

1.1 Our technique: derandomization viaexposure-resilient extractors

The computation of a probabilistic algorithm A on in-put x depends on the random coins used during the execu-tion. We denote the entire sequence of random coins by ρ,we say that ρ is the randomness of A, and we denote byA(x, ρ) the output of the algorithm A on input x when us-ing randomness ρ. Without loss of generality we assumethat the length of ρ is a function of the length of x. Aprobabilistic algorithm A calculates a language L (viewedhere as a boolean function) with bounded error if, for anyx, Probρ(A(x, ρ) = L(x)) > 2/3. It is useful to con-sider for each input x, the set of witnesses Wx consistingof the “good” random coins; i.e., Wx is the set of thoseρ for which A(x, ρ) = L(x). Note that for each x, Wx

has density greater than 2/3. The basic strategy in deran-domization is to use some pseudo-randomness tool that pro-duces a small set Z of strings that can be used as the ran-domness of A, such that, for each x, ‖Z ∩ Wx‖/‖Z‖ ≈density of Wx(> 2/3). It is sufficient to guarantee that‖Z ∩ Wx‖/‖Z‖ > 1/2, for all x, because if this holds truethen we can run A(x, z) for all z ∈ Z and take the majorityvote among the results of these computations. The typicalway to construct Z is to use a pseudo-random generator.Unfortunately, pseudo-random generators are known to ex-ist only if some “hardness”’ conditions are true and this iswhy most derandomization results are conditioned by hard-ness assumptions.

probabilistic algorithm is correct for a fraction of, say, 2/3 of randomnesschoices, there is one randomness choice that is correct for 2/3 of the inputs.The error probability can be reduced from 1/3 to negligible by iteration inthe standard way.

2Other combinations of parameters are possible. For example, the frac-

tion of exceptions and ε can be reduced to 2−T (n)d, for arbitrary constant

d, but this would imply a smaller constant α.

Departing from the traditional approach, Goldreich andWigderson [GW02] have obtained derandomization resultsusing extractors, which can be built unconditionally. We ex-tend their idea in a new direction, as we explain next. Thestandard view of extractors is as efficient procedures thatextract randomness from sources with low entropy. Moreprecisely, a (k, ε) extractor is a function E : 0, 1n ×0, 1d → 0, 1m, for some parameters n, d,m (usuallyd = O(log n)), such that for all distributions X with min-entropy k, E(X,Ud) is statistically ε-close to Um (whereUd and Um are the uniform distributions on 0, 1d and, re-spectively, 0, 1m). Extractors can be viewed as samplers:for all subsets B ⊆ 0, 1m, there are at most 2k+1 stringsu such that ‖E(u, y) | y ∈ 0, 1d ∩ B‖/2d is withinε from ‖B‖/2m. In other words, u is considered to be therandomness of the sampler and E(u, y)y∈0,1d are thesamples produced on randomness u. The key property isthat for any subset B ⊆ 0, 1m, for at least a fraction of1 − 2−(n−k−1) of the choices of randomness u, the frac-tion of samples that are in B is ε-close to the density of B.We will take m to be the length of the randomness of thealgorithm A that we want to derandomize. In the role ofthe set B we will take the witness sets Wx, for all the in-puts x of A. Then with high probability of the randomnessu, the fraction of points in E(u, y) | y ∈ 0, 1d thatare in Wx is close to the density of Wx, thus larger than1/2 (recall that the density of Wx is > 2/3), and we canuse the strategy that we sketched above. So far we did notderandomize the algorithm A because we still need the ran-domness u. How can we get rid of u? Our solution to thisquestion is to use x in the role of u, i.e., we will use theA’s input x itself to produce the samples to hit the witnessset Wx. This approach seems to be marred by an obviousflaw: since x and Wx are not independent, there is no guar-antee that this schema works. Note that for any subset of0, 1m, in particular for Wx, there are some “bad” ran-domness strings that produce samples that do not hit thesubset in the right way. It may very well happen that x issuch a “bad” randomness for Wx. However, recall that thealgorithm A on input x runs in sublinear time. This im-plies that determining whether some arbitrary randomnessρ is good or not for x (i.e., determining whether ρ ∈ Wx)depends on just a few bits of x. Therefore, intuitively, therest of x is in some sense independent of Wx and we couldemploy it to produce the sample points that hit Wx in theright way. To turn this intuition into a proof, we need touse an exposure-resilient extractor. This is a special typeof extractor introduced in [Zim06]. An exposure-resilientextractor E is an efficient procedure that (a) takes as input apair of strings (u, y), where u (“the weakly-random string”)is a random variable guaranteed to have at least a certainmin-entropy k, and y (“the seed”) is a uniformly distributedrandom string, and (b) outputs a string that looks random

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 3: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

even to computationally unbounded circuits that can queryu a bounded number of times. Since membership to Wx canbe determined by a circuit that queries x a bounded numberof times, it follows that the set E(x, y) | y ∈ 0, 1d hitsWx in the correct way for at least 1 − 2−(n−k−1) fractionof the strings x. This is exactly what we need for the abovederandomization schema. To obtain our result we take mto be T (n), the sublinear running time of the algorithm thatwe derandomize, and we need an exposure-resilient extrac-tor with length of the weakly-random string equal to n, out-put length and query resistance equal to m, and seed lengthequal to O(log m). Such an extractor is obtained by adjust-ing the construction in [Zim06] to obtain the right combina-tion of parameters.

Goldreich and Wigderson [GW02] introduced the ideaof taking the input x as the randomness of a sampler that isemployed to derandomize a probabilistic algorithm A run-ning on x itself. In their applications, the set that had to behit in the correct way was a large set of advice strings validfor all strings of a given length and thus did not depend onx. Consequently, they could use standard extractors as sam-plers. In our case, the set Wx does depend on x, and weneed the technique described above.

1.2 Definitions

Notations: For n ∈ N, [n] denotes the set 1, 2, . . . , n;an n-set is a set with cardinality n; e is the base of the natu-ral logarithm; xy denotes the concatenation of the stringsx and y in 0, 1∗; |x| denotes the length of the string x, and‖A‖ denotes the cardinality of the set A. The min-entropyof a random variable X taking values in 0, 1n is given by

min

log 1Prob(X=a)

∣∣∣ a ∈ 0, 1n,Prob(X = a) = 0

.

The min-entropy of X is denoted H∞(X). Thus if Xhas min-entropy ≥ k, then for all a in the range of X ,Prob(X = a) ≤ 1/2k. For each n ∈ N, let Un denote theuniform distribution over 0, 1n.

Definition 1.1 A language L is in BPTIME(T (n)) if thereis a probabilistic algorithm A running in T (n) time suchthat for all n and for all x ∈ 0, 1n, x ∈ L im-plies Probρ(A(x) = 1) > 2/3, and x ∈ L impliesProbρ(A(x) = 1) < 1/3, where the probabilities are takenover the randomness ρ of A.

In this paper the running time T (n) is sublinear and conse-quently we assume that the algorithm A has random accessto the input tape.

Definition 1.2 (Adaptive Test) Let X be a binary stringand let N be its length. An adaptive test is an oracle circuitthat uses a string such as X as the oracle set. A query toX is an index i ∈ [N ] and the oracle answers with X(i)(the i-th bit of X). We say that the adaptive test has query

complexity Q if the oracle circuit is allowed to make at mostQ queries.

The above definition is slightly different than the one givenin [Zim06], in which X was divided in blocks and an oracleanswer was an entire block. If D is an adaptive test, wedenote by DX the set of strings accepted by D when usingthe oracle X .

Definition 1.3 (Exposure Resilient Extractor) The valuesN, k, d,m, and Q are integer parameters, and ε > 0 is areal number parameter. A function E : 0, 1N×0, 1d →0, 1m is a (k, ε)-exposure resilient extractor resistant toQ queries if for every distribution X on 0, 1N with min-entropy at least k and for every adaptive test D with querycomplexity Q,

|ProbX,y∈0,1d(E(X, y) ∈ DX)

−ProbX,z∈Σm(z ∈ DX)| ≤ ε.(1)

As in the case of standard extractors, we implicitly havein mind a family of extractors indexed by N and with theparameters k, d,m, Q, and ε being functions of N . Givena function E (which will always be clear from the context)and D an adaptive test, we say that X hits D ε-correctly if

∣∣∣∣‖E(X, y) | y ∈ 0, 1d ∩ DX‖

‖0, 1d‖ − ‖DX‖‖0, 1m‖

∣∣∣∣ ≤ ε.

The following is an analog of a basic property of standardseeded extractors.

Lemma 1.4 [Zim06] Let E : 0, 1N ×0, 1d → 0, 1m

and ε > 0. Suppose that for every adaptive test D withquery complexity Q the number of x ∈ 0, 1N that do nothit D ε-correctly is at most 2t, for some t. Then E is a(t + log(1/ε), 2ε)-extractor that is resistant to Q queries.

2 Main result

We first show that if we have an appropriate exposure-resilient extractor then we can simulate deterministically aBPTIME algorithm without too much time overhead, suchthat the deterministic simulator is correct on a large fractionof the inputs at any given length.

Lemma 2.1 Let E : 0, 1N × 0, 1d → 0, 1m be a(k, 1/6)-exposure-resilient extractor resistant to Q querieswhich is computable in time polynomial in m. Let L ∈BPTIME(T (N)), for some function T satisfying T (N) ≤min(m,Q). Then there exists a deterministic algorithmrunning in time poly(m) · 2d · T (N) that solves L for atleast 2N − 2k+1 of inputs x ∈ 0, 1N .

Proof There is a probabilistic machine M running in timeT (N) such that for every x ∈ 0, 1N , if x ∈ L then M ac-cepts x with probability > 2/3 and if x ∈ L then M accepts

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 4: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

x with probability < 1/3. We can assume that the machineM uses on inputs of length N randomness of length m (be-cause T (N) ≤ m). For every x ∈ 0, 1N , we define

Wx = ρ | M accepts input x when using randomness ρ.Note that Wx can be decided by an adaptive test with or-acle access to x that makes at most Q queries. This is sobecause to check if some ρ ∈ Wx or not, we can simulateM on input x using randomness ρ and since M runs in timeT (N) ≤ Q, the simulation needs at most Q bits of x.

Let D = 2d and let y1, . . . , yD be all the strings in0, 1d. Using the exposure-resilient extractor E, we cal-culate ρi = E(x, yi) for all i ∈ [D]. Next we do Dsimulations of M on input x using randomness ρ1, . . . , ρD

and take the majority verdict. We claim that if x hits Wx

(1/6)-correctly (such an x is called a correct hitter), then theabove procedure is correct. The theorem follows from theclaim because the number of correct hitters is greater than2N − 2k+1. Indeed, if there were 2 · 2k bad hitters then atleast one of the sets BAD+ or BAD− would have at least 2k

elements, where BAD+ = x | Proby(E(x, y) ∈ Wx >‖Wx‖2m + 1

6 and BAD− = x | Proby(E(x, y) ∈ Wx <‖Wx‖2m − 1

6. We could define the distribution X which isuniform on the larger of the sets BAD+ and BAD− and0 on the complement of that set. Then H∞(X) ≤ k andE(X, y) would not satisfy Equation (1) for the adaptive testWx.

It remains to prove the above claim. Since x hits Wx

(1/6)-correctly, we have that

‖Wx‖2m

− 16≤ Probi∈[D](E(x, yi) ∈ Wx) ≤ ‖Wx‖

2m+

16.

If x ∈ L, then

Probi(M accepts x when using randomness ρi)

= Probi∈[D](E(x, yi) ∈ Wx)

≥ ‖Wx‖2m − 1

6 > 23 − 1

6 = 12 ,

(2)

and if x ∈ L, then

Probi(M accepts x when using randomness ρi)

= Probi∈[D](E(x, yi) ∈ Wx)

≤ ‖Wx‖2m + 1

6 < 13 + 1

6 = 12 .

(3)

In Section 3 we construct the exposure-resilient extractorthat we need. Its properties are summarized in the followinglemma.

Lemma 2.2 There is a constant a > 1, such that for anym ≤ N1/a, there exists a (k, 1/6)-exposure resilient ex-tractor EXT : 0, 1N × 0, 1d → 0, 1m, resistant tom queries with the folowing parameters:

(a) k = N − Ω(m log m),

(b) d = O(log m).

The running time of EXT is m polylog(m).

Combining the above two lemmas we obtain our main the-orem.

Theorem 2.3 There exists a constant natural number asuch that for any function T satisfying T (N) < N1/a andfor any language L ∈ BPTIME(T (N)) there is a deter-ministic algorithm running in poly(T (N)) time that calcu-lates L on at least a (1− 2−Ω(T (N) log T (N))) fraction of allinputs x ∈ 0, 1N , for all N .

An analysis whose details are omitted in the paper showsthat the main theorem holds for a = 45.

3 The exposure-resilient extractor

The paper [Zim06] shows the construction of anexposure-resilient extractor EZim′06 with the following pa-rameters: (a) the input length of the weakly-random stringN is of the form n1 · 2n1 for some n1 ∈ N, (b) the queryresistance is N δ , for constant δ < 1 but arbitrarily close to1, (c) k = N − NΩ(1), (d) ε = N−Ω(1), (e) seed length isd = O(log N), and (f) the output length is m = NΩ(1).

For our extractor we need to reduce the seed length to beO(log m). This can be simply achieved by taking from theweakly-random string X a prefix of length N = poly(m)and applying the extractor EZim′06 to the prefix (viewed asthe weakly-random input of EZim′06). The remaining pa-rameters of EZim′06 are in the same ballpark with those inLemma 2.2 but lead to a large constant a. The construc-tion in [Zim06] was driven by the goal to achieve an aslarge query resistance as possible while keeping the otherparameters in a certain range; here the focus is mainly inmaximizing the output length and obtaining a certain bal-ance between the other parameters. However, by redoingthe construction in [Zim06] with a different choice of pa-rameters at certain steps, we obtain the following lemma,in which the parameters are expressed in terms of the out-put length m. The resulting a is about half of what would beobtained with the analysis from [Zim06]. For completeness,the proof is given in the appendix.

Lemma 3.1 There exist a constant s and a (k, 1/6)- expo-sure resilient extractor E : 0, 1N × 0, 1d → 0, 1m,resistant to Q queries with the folowing parameters:

(a) N is of the form N1 log N1, for N1 = ms log3 m,

(b) Q = m,

(c) k = N − m log m,

(d) d = O(log N)(= O(log m)),

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 5: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

(e) E is computable in m polylog(m) time (with random-access to the tape containing the weakly-randomstring).

Now we require that N ≥ ms+1 and easily obtain fromthe extractor E above the extractor EXT satisfying the prop-erties required in Lemma 2.2. Namely, according to ourplan we define the extractor EXT : 0, 1N × 0, 1d →0, 1m by EXT(X, y) = E(pref(X), y), where pref(X)is the prefix of X of length N = N1 log N1 with N1 =ms · log3 m. Now it is easy to see that pref(X) has suffi-cient min-entropy.

Claim 3.2 If X is a distribution on 0, 1N with min-entropy N − m log m, then the distribution pref(X) hasmin-entropy N − m log m.

It follows that EXT is the exposure-resilient extractor withthe properties stated in Lemma 2.2.

Proof of Claim 3.2. Suppose there is a0 such thatProb(pref(X) = a0) > 2−(N−m log m). Let suff(X)denote the string formed by the last N − N bits of X .Since

∑b Prob(suff(X) = b | pref(X) = a0) = 1,

where the sum is over all strings b ∈ 0, 1N−N , thereis b0 such that Prob(suff(X) = b0 | pref(X) = a0) ≥2N−N . Then Prob(X = a0b0) = Prob(pref(X) =a0) · Prob(suff(X) = b0 | pref(X) = a0) > 2−(N−N) ·2−(N−m log m) = 2−(N−m log m), which contradicts the factthat X has min-entropy N − m log m.

4 Acknowledgments

I am grateful to one anonymous referee who suggestedseveral significant improvements in the presentation.

References

[BM84] Manuel Blum and Silvio Micali. How togenerate cryptographically strong sequences ofpseudo-random bits. SIAM Journal on Comput-ing, 13(4):850–864, November 1984.

[CEF+05] A. Czumaj, F. Ergun, L. Fortnow, A. Ma-gen, I. Newman, R. Rubinfeld, and C. Sohler.Sublinear approximation of euclidean minimumspanning tree. SIAM Journal on Computing,35(1):91–109, 2005.

[Con01] A. Condon. Bounded error probabilistic fi-nite state automata. In P. Pardalos, J. Reif,and J.Rolim, editors, Handbook on RandomizedComputing, Volume II, pages 509–532. Kluwer,2001.

[CRT01] B. Chazelle, R. Rubinfeld, and L. Trevisan. Ap-proximating the minimum spanning tree weightin sublinear time. In Proceedings of the 28thInternational Colloquium on Automata, Lan-guages, and Programming, pages 190–200,2001.

[DK01] P. Drineas and R. Kannan. Fast Monte-Carlo al-gorithms for approximate matrix multiplication.In Proceedings of the 42nd IEEE Symposium onFoundations of Computer Science, pages 452–459, 2001.

[DKM04] P. Drineas, R. Kannan, and M. W. Mahoney.Fast Monte-Carlo algorithms for matrices I: Ap-proximating matrix multiplication. TechnicalReport YALE/DCS/TR-1269, Yale University,2004.

[Gol97] O. Goldreich. A sample of samplers: a com-putational perspective on sampling, May 1997.ECCC Technical report TR97-020, available athttp://www.eccc.uni-trier.de/eccc.

[GRS98] O. Goldreich, R. Rubinfeld, and M. Sudan.Learning polynomials with queries - the highlynoisy case. Technical Report TR98-060, Elec-tronic Colloquium on Computational Complex-ity, 1998. http://www.eccc.uni-trier.de/eccc.

[GW02] O. Goldreich and A. Wigderson. Derandomiza-tion that is rarely wrong from short advice that istypically good. In RANDOM 2002, pages 209–223, 2002.

[HILL99] Johan Hastad, Russell Impagliazzo, LeonidLevin, and Michael Luby. A pseudoran-dom generator from any one-way function.SICOMP, 28(4):1364–1396, 1999. A prelimi-nary version appeared in 21rst STOC, 1989.

[Imp03] R. Impagliazzo. Hardness as randomness: asurvey of universal derandomization. TechnicalReport CoRR cs.CC/0304040, 2003.

[IW97] Russell Impagliazzo and Avi Wigderson. P =BPP if E requires exponential circuits: Deran-domizing the XOR lemma. In Proceedings ofthe 29th Annual ACM Symposium on the The-ory of Computing (STOC ’97), pages 220–229,New York, May 1997. Association for Comput-ing Machinery.

[IW98] R. Impagliazzo and A. Wigderson. Random-ness vs. time: De-randomization under a uni-form assumption. In Proceedings of the 38thIEEE Symposium on Foundations of ComputerScience, pages 734–743, 1998.

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 6: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

[Kab02] Valentine Kabanets. Derandomization: a briefoverview. Bulletin of the EATCS, 76:88–103,2002. Also available as ECCC 8-2002.

[KI04] V. Kabanets and R. Impagliazzo. Derandomiz-ing polynomial identity tests means proving cir-cuit lower bounds. Computational Complexity,13(1–2):1–46, 2004.

[Mil01] Peter Bro Miltersen. Derandomizing complex-ity classes. In P. Pardalos, J. Reif, and J.Rolim,editors, Handbook of Randomized Computing.Kluwer Academic Publishers, 2001.

[Nis92] N. Nisan. Pseudorandom bits for space-bounded computation. Combinatorica,12(4):449–461, 1992.

[NW94] N. Nisan and A. Wigderson. Hardness vs. ran-domness. Journal of Computer and System Sci-ences, 49:149–167, 1994.

[Rab63] Michael O. Rabin. Probabilistic automata. In-formation and Control, 6(3):230–245, 1963.

[Sak96] Michael E. Saks. Randomization and deran-domization in space-bounded computation. InIEEE Conference on Computational Complex-ity, pages 128–149, 1996.

[Tre01] L. Trevisan. Extractors and pseudorandom gen-erators. Journal of the ACM, 48(4):860–879,2001.

[Yao82] A. Yao. Theory and application of trapdoorfunctions. In Proceedings of the 23rd IEEESymposium on Foundations of Computer Sci-ence, pages 80–91, 1982.

[Zim05] M. Zimand. Simple extractors via constructionsof cryptographic pseudo-random generators. InProceedings of the 32nd International Collo-quium on Automata, Languages, and Program-ming, Lecture Notes in Computer Science 3580,pages 115–127. Springer-Verlag, 2005.

[Zim06] M. Zimand. Exposure-resilient extractors. InProceedings of the 21st Structure in ComplexityTheory Conference, pages 61–72. IEEE Com-puter Society Press, July 2006.

Appendix

A Proof of Lemma 3.1

The construction is essentially the same as in [Zim06]with several different choices of parameters. At the highlevel, we use Trevisan’s approach that adapts known con-structions of pseudo-random generators to produce ex-tractors. Specifically, Trevisan [Tre01] has shown thatthe Nisan-Wigderson [NW94] construction of a pseudo-random generator from a hard function can be used almostdirectly to build an extractor. Zimand [Zim05] has shownthat the Blum-Micali-Yao [BM84, Yao82] construction ofa pseudo-random generator, using as a building block aone-way permutation, also produces extractors. It has beenobserved in [Zim06], that the construction from [HILL99]of a pseudo-random generator using as a building block aone-way function, produces exposure-resilient extractors.As mentioned above we use here the same approach. Thereader should consult the exposition in [Zim06] for back-ground on the underlying techniques.

We consider a random functions R : 0, 1n1 →0, 1n1 , for some integer length n1. We use the notationN1 = 2n1 . Let S and T be two integer parameters so thatST ≤ N1. Let C be an oracle circuit that queries the oracleat most S times. The oracle circuit C is a probabilistic oneand S is a bound on the number of queries on any proba-bilistic branch of C and for any oracle. Unlike the situationin Definition 1.2 where the oracle gives one bit, in this sec-tion the oracle is a function such as R, and on a query q,where q is an element in the domain of R, it returns theentire block R(q). We write CR to mean that the circuitC uses the function R as the oracle. For a binary string ρ,we use the notation CR(x, ρ) to denote the computation ofCR on input x and with random bits given by ρ. For anynatural number P , we say that R is P − to − 1 if each ele-ment in the range of R has at most P preimages. The nextlemma collects a series of facts regarding random functionsR : 0, 1n1 → 0, 1n1 , that will be used later.

Lemma A.1 We use the parameters introduced in the pre-ceding paragraph. Let P ∈ N, P > n1.

(a) Let E be the event (over random R) “R is notP − to − 1.” The probability of E is bounded by2−P (log P−log e).

(b) Let B be the event (over random pairs (R, ρ))“‖x ∈0, 1n1 | CR(R(x), ρ) ∈ R−1(R(x))‖ ≥ 2e · P ·S · T .” The probability of B is bounded by 2−T +2−P (log P−log e).

(c) We say that R : 0, 1n1 → 0, 1n1 is “good” fora probabilistic oracle circuit C if with probability of ρ

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 7: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

greater than 1/2 it holds that

‖x ∈ 0, 1n1 | CR(R(x), ρ) ∈ R−1(R(x))‖≥ 2e · P · S · T.

(4)Let G be the event (over random R) “R is good forC.” Then the probability of G is at most 2 · (2−T +2−P (log P−log e)).

Proof The proofs of (a) and (b) are in [Zim06, Lemma4.1].

(c) Let p be the probability that a random R is “good” forC. Then p · 1

2 is ≤ (the fraction of the pairs (R, ρ) with theproperty that ‖x ∈ I | CR(R(x), ρ) ∈ R−1(R(x))‖ ≥2e · P · S · T ). The latter fraction, by point (b), is ≤ 2−T +2−P (log P−log e). The conclusion follows.

We proceed to the construction of the extractor E. Theweakly-random-string input of the extractor is the truth-table of a function R : 0, 1n1 → 0, 1n1 . In what fol-lows we will consider that the function R : 0, 1n1 →0, 1n1 is such that the min-entropy of R(x) (when x isuniformly distributed in 0, 1n1 ) is at least n1 − log P ,for some parameter P . Recall that if we take the func-tion R at random, the above holds with probability at least1 − 2−P (log P−log e) (by Lemma A.1, (a)).

There construction has three main steps. We start withR(x), for random x ∈ 0, 1n1 .

STEP 1. - Adding hidden bits.We take q = log P +2 log(1/ε1)+1, for some parameter

ε1, and consider the field GF [2q]. We identify x ∈ 0, 1n1

with a polynomial px over the field GF [2q] with b = n1/qvariables y1, . . . , yb (to keep notation simple, we ignoretruncation issues). This is done by breaking x into b blocksx1, . . . , xb of q bits each, viewing each xi as an element ofGF [2q], and taking px(y1, . . . , yb) = x1y1 + . . . + xbyb.For r ∈ 0, 1n1 , we define αr to be the r-th element of(GF [2q])b. We define a : 0, 1n1 × 0, 1n1 → GF [2q]by a(x, r) = px(αr). Note that a(x, r) is q bits long.

Lemma A.2 Let C be a circuit with query complexity S.Let ε > 0. Then there exists K = 22q circuits A1, . . . , AK

such that if for some R : 0, 1n1 → 0, 1n1 it holds that

Probx∈0,1n1 ,r[CR(R(x), a(x, r), r) = 1]

−Probx,r,Uq[CR(R(x), Uq, r) = 1] > ε

(5)

then there exist i ∈ [K] such that for more than a fractionof 1/2 of the strings ρ

‖x ∈ 0, 1n1 | ARi (R(x), ρ) ∈ R−1(R(x))‖ > ε′ · N1,

where ε′ = 123q+1 ε. The circuits Ai have query complexity

S · poly(n1, b, 1/ε′, q).

Proof Suppose the relation in Equation (5) holds for somefunction R which we fix for the rest of the proof. Then,adapting Yao’s proof relating distinguishers to predictors,one can show the following (the same claim, with proof,appears as Claim 5.2 in [Zim06]).

Claim A.3 There is a probablistic circuit D with querycomplexity S and randomness of length 2q such that

Prob[DR(R(x) r) = a(x, r)] >12q

+ ε · 12q

,

where the probability is taken over random x, R, and therandomness of D.

The above Claim implies, using an averaging argument,that for at least a fraction of ε′ = 1

2q+1 · ε of x ∈ 0, 1n1 ,

Prob[DR(R(x), r) = a(x, r)] >12q

+ ε′,

where the probability is taken over r and the randomness ofD. We call an x verifying the above relation “good.” Letσ be the randomness of D. For a “good” x, we can fix in aconvenient way the randomness σ such that DR(R(x), r),with the fixed value for σ, agrees with a(x, r) = px(r) onat least a (1/2q) + ε′ fraction of r’s. Recall that σ is 2q bitslong. Fixing σ in all possible ways, we get from D a num-ber of K = 22q circuits D1, . . . , DK and, by the abovediscussion, for any “good” x, there is some circuit Di,i ∈ K, such that DR

i (R(x), r) agrees with px(r) on at leasta (1/2q) + ε′ fraction of r’s. By the result from [GRS98]there is a probabilistic procedure that on input R(x) makesat most poly(n1, b, 1/ε′, q) invocations of Di(R(x), r) andproduces a list with at most poly(n1, b, 1/ε′, q) elements,and if DR

i (R(x), r) agrees with px(r) as above, then, withprobability greater than 1 − 2−n1 , the list contains x. 3 Byquerying all the members of the list we obtain an element inthe preimage set of R(x) (again in case DR

i (R(x), r) agreeswith px(r) as above). Note that the number of queries toR (needed during the invocations of Di(R(x), r) and forquerying the list members) is poly(n1, b, 1/ε′, q) · S (be-cause Di makes at most S queries). Let us call this proce-dure Ai and let ρ denote its randomness. We say that Ai

is an ”inverter” for x if ARi (R(x), ρ) ∈ R−1(R(x)), with

probability of ρ at least 1 − 2−n1 . There are 12q+1 · ε · N1

“good” x and each one has an ”inverter” in the set ofK = 22q procedures Ai, i ∈ [K], defined above. Thus theremust be one procedure Ai that is an ”inverter” for at least afraction of 1

22q · 12q+1 ·ε of x in 0, 1n1 . Let us fix this prob-

abilistic procedure Ai and let J be the set of x ∈ 0, 1n1

for which Ai is an inverter. Thus ‖J‖ ≥ 123q+1 · ε · N1.

We take a subset J ′ of J of size exactly 123q+1 · ε · N1. For

3An analysis of the algorithm from [GRS98] shows that the numberof invocations referred as poly(n1, b, 1/ε′, q) is bounded by O((bq +n1)(1/ε′)3).

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 8: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

each x ∈ J ′, the fraction of random strings ρ such thatAR

i (R(x), ρ) ∈ R−1(R(x)) is ≤ 2−n1 = 1N1

. Thus, bythe union bound, the fraction of strings ρ such that thereexists some x ∈ J ′ with AR

i (R(x), ρ) ∈ R−1(R(x)) is≤ 1

N1· 1

23q+1 · ε · N1 < 12 .

STEP 2. - Hashing.We hash R(x) to h(R(x)), where h is a hash function

randomly chosen from a universal family of hash functionsmapping 0, 1n1 to 0, 1n1−log P−2 log(1/ε1), for someparameter ε1. By the left-over hash lemma (h(R(x), h)is ε1-close to the uniform distribution. Concretely, weuse Toeplitz matrices for defining the universal family ofhash functions (for example see [Gol97]). We have |h| =n1 + (n1 − log P − 2 log(1/ε1)) < 2n1.

Lemma A.4 Let C be an oracle circuit with query com-plexity S. Then there are K < 22n1 oracle circuits Ai,i ∈ [K], each with query complexity S such that if for someR : 0, 1n1 → 0, 1n1 that has H∞(R(x)) ≥ n1−log P ,it holds that

Prob[CR(h(R(x)) a(x, r) h r) = 1]

−Prob[CR(Un1−(q−1) Uq h r) = 1] > ε,(6)

then for some circuit Ai, i ∈ K,

Prob[ARi (R(x) a(x, r) r) = 1]

−Prob[ARi (R(x) Uq r) = 1] > ε − ε1,

(7)

where the probabilities are taken over independent, randomchoices of x, r, h, Un1−(q−1), Uq.

Proof The left hand side in Equation (6) is equal to

Prob[CR(h(R(x)) a(x, r) h r) = 1]

−Prob[CR(h(R(x)) Uq h r) = 1]

+Prob[CR(h(R(x)) Uq h r) = 1]

−Prob[CR(Un1−(q−1) Uq h r) = 1].(8)

The expression in the last two lines is bounded by ε1 (by theleft-over hash lemma). Therefore, the expression in the firsttwo lines is > ε − ε1. We can build in the obvious way aprobabilistic circuit B that chooses h randomly such that

Probh,x,r[BR(R(x) a(x, r) r) = 1]

−Probh,x,Uq,r[BR(R(x) Uq r) = 1] > ε − ε1.(9)

Fixing h in all possible ways we derive from B a numberof K < 22n1 deterministic circuits Ai, i ∈ [K], and one ofthem satisfies Equation (7).

STEP 3. - StretchingWe define the actual extractor E by the following algo-

rithm.

Parameters: n1 ∈ N, m ∈ N; a universal family ofhash functions as specified above.Input: the weakly random string R ∈ 0, 1n12

n1 ,seed = x r h, where x ∈ 0, 1n1 , r ∈ 0, 1n1 ,h ∈ 0, 12n1−2 log P−log(1/ε1). R is viewed as a func-tion mapping 0, 1n1 to 0, 1n1 and h is viewed asa hash function from a universal set of hash functions,as discussed.b0 = empty string; t0 = x.for i = 1 to m

bi = the first bit of h(R(ti−1)) a(ti−1, r).ti = the last n1 bits of h(R(ti−1)) a(ti−1, r).

Output: b1 b2 . . . bm (i.e., E(R, seed) = b1 b2 . . . bm).

Lemma A.5 For any oracle circuit C with query complex-ity S, there are K = 2m oracle circuits Ai, i ∈ [K], eachcircuit Ai having query complexity S + m, such that if forsome R and for some ε > 0,

Probx,r,h[CR(EXT(R, (x r h))) = 1]

−ProbUm[CR(Um) = 1] > ε,

(10)

then for some circuit Ai, i ∈ [K],

Prob[ARi (h(R(x)) a(x, r) h r) = 1]

−Prob[ARi (Un1−(q−1) Uq h r) = 1] > ε

m ,(11)

where the above probabilities are over independent randomchoices of x, r, h, Un1−(q−1), Uq.

Proof Lemma 5.4 in [Zim06] is similar with the only ex-ception that in the conclusion there is just one probabilis-tic algorithm A that satisfies the analogue of Equation (11).The circuit A uses at most m random bits. By fixing thosebits in all possible way, we obtain K = 2m circuits Ai,i ∈ [K], and at least one of them satisfies Equation (11).

Finally we can proceed to the main objective of this sec-tion.

Proof of Lemma 3.1. We compose in order STEPS 1, 2,and 3, and the extractor is the procedure described in STEP3.

Let c be a constant that will be specified later. The otherparameters required by the construction are taken as fol-lows: T = m log m, Q = m, S = mc+1 log m, P = m,N1 = mc+13 log3 m, n1 = log N1, ε = 1/6. It can bechecked that the running time of the procedure in STEP 3is m polylog(m). Let E be the function from STEP 3 withthese parameters.

Fix an oracle circuit C with query complexity Q. Wewill assume that on a query i the oracle gives not only the i-th bit of the truth-table of R but the entire block containing

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007

Page 9: [IEEE Twenty-Second Annual IEEE Conference on Computational Complexity - San Diego, CA, USA (2007.06.13-2007.03.16)] Twenty-Second Annual IEEE Conference on Computational Complexity

that bit, i.e., the oracle R behaves like a function oracle.This can only increase the power of C.

Lemma 3.1 follows immediately from the next claim (byLemma 1.4).

Claim A.6 The number of functions R : 0, 1n1 →0, 1n1 that do not hit CR ε-correctly is bounded by2n1N1−Ω(m log m).

Proof of Claim. Let us consider such a function R. Thefunction R either is P -to-1 or not. Let us assume that itis (recall from Lemma A.1,(a), that this happens with highprobability).

By Lemma A.5, there are K1 = 2m oracle circuits, de-noted C1, . . . , CK1 , each one with query complexity Q+m,and for some i ∈ 1, . . . , K1, it holds that

Probx,r,h[CRi (EXT(R(x), h) a(r, x) h r) = 1]

−ProbUn+1,r,h[CRi (Un+1 h r) = 1] > ε/m.

(12)In Step 2, we use a family of hash functions h : 0, 1n1 →0, 1n1−log P−2 log(2m/ε) where |h| ≤ 2n1. Let q =log P + 2 log(2m/ε) + 1. By Lemma A.4, there are K2 <22n1 oracle circuits, denoted C(i,1), . . . , C(i,K2), each onewith query complexity Q + m and for some j ∈ [K2], itholds that

Prob[CR(i,j)(R(x) a(x, r) r) = 1]

−Prob[CR(i,j)(R(x) Uq r) = 1]

> ε/m − ε/(2m) = ε2m .

Let ε′ = 123q+1 · ε

2m = ε7

2048 · 1P 3m7 . By Lemma A.2,

there are K3 = 22q oracle probabilistic circuits, de-noted C(i,j,1), . . . , C(i,j,K3), each one with query complex-ity (Q + m)poly(n1, n1/q, 1/ε′, q) and one of them, de-noted C(i,j,) has the property that for more than a fraction1/2 of the strings ρ (where ρ denotes the randomness ofC(i,j,))

‖x ∈ 0, 1n1 | CR(i,j,)(R(x), ρ) ∈ R−1(R(x))‖

> ε′ · N1.

It can be seen that there is a constant c, which we men-tioned above, such that poly(n1, n1/q, 1/ε′, q) ≤ mc.4

By the way the parameters were chosen, it holds that thequery complexity of C(i,j,) is less than S and ε′ · N1 ≥(2e · P · S · T ). Recall from Lemma A.1 (c) that we saythat R is “good” for a probabilistic oracle circuit C if formore than a fraction 1/2 of strings ρ, ‖x ∈ 0, 1n1 |CR(R(x), ρ) ∈ R−1(R(x))‖ ≥ 2e · P · S · T . Therefore

4The constant c depends essentially on the number of queries made bythe polynomial reconstruction algorithm in [GRS98]. It can be shown thatc ≤ 31.

if R does not hit C ε-correctly then either (a) R is not P -to-1, or (b) R is “good” for the circuit C(i,j,), one of theK1 ·K2 ·K3 circuits that result from the composition of thethree steps.

Note that K1 · K2 · K3 = 2m+O(log m). By Lemma A.1(a), the number of functions R with H∞(R(x)) < n1 −log P is bounded by 2n1N1 · 2−P (log P−log(e)). Also,by Lemma A.1 (c), the number of functions R thatare “good” for some circuit is bounded by 2n1N1 · 2 ·(2−T + 2−P (log P−log(e))). Thus the number of func-tions R that do not hit CR ε-correctly is bounded by2n1N1 · 2−P (log P−log(e)) + K1 · K2 · K3 · 2n1N1(2−T +2−P (log P−log(e))) ≤ 2n1N1−Ω(m log m) = 2N−Ω(m log m).

Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07)0-7695-2780-9/07 $20.00 © 2007