Maximal Sparsity Representation via l Minimization ... Maximal Sparsity Representation via l1...

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Maximal Sparsity Representation via l Minimization ... Maximal Sparsity Representation via l1...

  • Maximal Sparsity Representation via l1 Minimization

    David L. Donoho∗ and Michael Elad†

    August 15, 2002


    Finding a sparse representation of signals is desired in many applications. For a repre-

    sentation dictionary D and a given signal S ∈ span{D}, we are interested in finding the

    sparsest vector γ such that Dγ = S. Previous results have shown that if D is composed

    of a pair of unitary matrices, then under some restrictions dictated by the nature of the

    matrices involved, one can find the sparsest representation using an l1 minimization rather

    than using the l0 norm of the required composition. Obviously, such a result is highly de-

    sired since it leads to a convex Linear Programming form. In this paper we extend previous

    results and prove a similar relationship for the most general dictionary D. We also show

    that previous results are emerging as special cases of the new extended theory. In addition,

    we show that the above results can be markedly improved if an ensemble of such signals is

    given, and higher order moments are used.

    Keywords: Sparse Representation, Atomic Decomposition, Convex Optimization, Linear

    Programming, Basic Pursuit, Matching Pursuit.

    ∗Department of Statistics, Stanford University, Stanford 94305-9025 CA. USA. †Department of Computer Science (SCCM), Stanford University, Stanford 94305-9025 CA. USA.


    elad Draft: to be submitted to the IEEE Transactions on Information Theory on September 2002

  • 1 Introduction

    A sparse representation for a signal is a desired efficient description of it that can be used

    for its analysis or compression [1]. However, far deeper reasons lead to the search for sparse

    representations for signals. As it turns out, one of the most natural and effective priors in

    Bayesian theory for signal estimation is the existence of a sparse representation over a suitable

    dictionary. This prior is leaning on the assumption that its ground–truth representation is

    expected to be simple and thus sparse in some representation space [1]. Indeed, it is sparsity

    that lead to the vast theoretic and applicative work in Wavelet theory [1].

    More formally, we are given a representation dictionary D defined as a matrix of size [N × L].

    We hereby assume that the columns of D, denoted as {dk}Lk=1, are normalized, i.e. ∀1 ≤ k ≤

    L, dHd = 1. These columns are to be used to represent incoming signals S ∈ span{D} ⊆ CN .

    Note that we do not claim any relationship between N and L, and in particular, N may be larger

    than L, implying that the proposed representation space is not complete.

    Given a signal vector S, we are interested in finding the sparsest vector γ such that Dγ = S.

    This process is commonly referred to as atomic decomposition, since we decompose the signal

    S into its building atoms, taken from the dictionary. The emphasis here is on finding such a

    decomposition that uses as few as possible atoms. Thus, we resort to the following optimization


    (P0) Minimize ‖γ‖0 subject to S = Dγ. (1)

    Obviously, two easy-to-solve special cases are the case of a unique solution to Dγ = S and the


  • case of no feasible solution at all. While both these cases lead to easy-to-solve (P0), in general,

    (P0) solution requires combinatorial search through all the combinations of columns from D, and

    as such, its complexity grows exponentially with L.

    Thus, we are interested either in an approximation of (P0) solution, or better yet, a numerical

    shortcut leading to its exact solution. Matching Pursuit (MP) [1, 2] and Basis Pursuit (BP)

    [3] are two different methods to achieve the required simplifying goal. In the MP and related

    algorithms, a sequential sub-optimal representation is sought using a greedy algorithm. As such,

    these family of algorithms lead to an approximation of (P0) solution.

    A numerically more complicated approach, which in some cases lead to the exact solution of

    (P0), is the BP algorithm. BP suggests solving (P0) by replacing it with a related (P1) problem

    defined by

    (P1) Minimize ‖γ‖1 subject to S = Dγ. (2)

    As can be seen, the penalty is replaced by an l1 norm (sum of absolute values). As such, (P1)

    is a convex programming problem implying that we expect no local minima problems in its

    numerical solution. Actually, a well known result from optimization theory shows that an l1

    minimization could be solved using a Linear Programming procedure [3, 4, 5]. Recent results

    in numerical optimization and the introduction of the interior point methods turn the above

    described problem to a practically solvable one, even for very large dictionaries.

    A most interesting and surprising result due to Donoho and Huo [6] is that the solution of

    (P1) in some cases coincides with the (P0) one. Donoho and Huo assumed a specific structure


  • of D, built by concatenating two unitary matrices, Φ and Ψ of size N × N each, thus giving

    that L = 2N . For this specific dictionary form, they developed conditions for the equivalence

    between the (P0) and (P1) solutions. These conditions were expressed in terms of the involved

    dictionary D (and actually, more accurately, in terms of Φ and Ψ). Later these conditions were

    improved by Elad and Bruckstein to show that the equivalence is actually true for a wider class

    of signals [7, 8].

    In this paper we further extend the results in [6, 7, 8], and prove a (P0)-(P1) equivalence for

    the most general form of dictionaries. In order to prove this equivalence we address two questions

    for a given dictionary D and signal S:

    1. Uniqueness: Having solved the (P1) problem, under which conditions can we guarantee

    that this is also the (P0) solution as well? This question is answered by generalizing the

    uniqueness Theorem in [6, 7, 8].

    2. Equivalence: Knowing the solution of the (P0) problem (or actually, knowing its l0 norm),

    what are the conditions under which (P1) is guaranteed to lead to the exact same solution?

    This question is answered by generalizing the equivalence Theorem in [6, 7, 8].

    The proposed analysis adopts a totaly new line of reasoning, compared to the work done in

    [6, 7, 8], and yet, we show that all previous results emerge as special cases of this new analysis.

    So far atomic decomposition was targeted towards dealing with a single given vector, finding

    the limitations of using (P1) instead of (P0) in order to decompose it to its building atoms

    taken from the dictionary D. This is the problem solved in [6, 7, 8] and in this paper too. An

    interesting extension of the above results correspond a source generating an ensemble of random


  • sparse representation signals from the same dictionary using the same stationary random rule.

    The questions raised are whether there is something to gain from the given multiple signals, and

    if so then how. As it turns out, use of higher moments leads in this case to a similar formulation

    of (P0), and again, (P1) similar form comes to replace it as a traceable alternative. We show that

    indeed, similar relations between the (P0) and the (P1) hold, with far weaker conditions due to

    the increased dimensionality, implying that less restrictions are posed to guarantee the desired

    (P0)-(P1) equivalence.

    This paper is organized as follows: In the next section we briefly repeat the main results found

    in [6, 7, 8] on the uniqueness and equivalence Theorems for the two-unitary matrices dictionary.

    Section 3 then extends the uniqueness Theorem for an arbitrary dictionary. Section 4 similarly

    extends the equivalence results for general form dictionary. The idea to use an ensemble of signals

    and higher moments for accurate sparse decomposition is covered in Section 5. We summarize

    and draw future research directions in Section 6.

    2 Previous Results

    As was said before, previous results refer to the special case where the dictionary is built by

    concatenating two unitary matrices, Φ and Ψ of size N ×N each, giving D = [Φ,Ψ]. We define

    φi and ψj (1 ≤ i, j ≤ N) as the columns of these two unitary matrices. Following [6] we define a

    real-positive scalar M representing the cross-correlation between these two bases by

    M = Sup{|〈φi, ψj〉|, ∀1 ≤ i, j ≤ N}.


  • Thus, given the two matrices Φ and Ψ, M can be computed, and it is easy to show [6, 8] that

    1/ √ N ≤M ≤ 1. The lower bound is obtained for a pair such as spikes and sines [6] or Identity

    and Hadamard matrices [8]. The upper bound is obtained if at least one of the vectors in Φ is

    also found in Ψ. Using this definition for M , the following Theorem states the requirement on

    a given representation such that it is guaranteed to be the solution of the (P0) problem:

    Theorem 1 - Uniqueness: Given a dictionary D = [Φ,Ψ], given its corresponding cross-

    correlation scalar value M as defined in Equation (3), and given a signal S, a representation of

    the signal by S = Dγ is necessarily the sparsest one possible if ‖γ‖0 < 1/M .

    This Theorem’s proof is given in [7, 8]. A somewhat weaker version of it requiring ‖γ‖0 <

    0.5(1 +M−1) is p