Jul 5, 2005 - Ax. In general, this problem is. NP-hard. However, for many matrices A there is a threshold phenomenon: if the sparsest solution is suff...

0 downloads 17 Views 339KB Size

Sparse nonnegative solution of underdetermined linear equations by linear programming David L. Donoho* and Jared Tanner Department of Statistics, Stanford University, Stanford, CA 94305-4065 Contributed by David L. Donoho, March 29, 2005

Consider an underdetermined system of linear equations y ⴝ Ax with known y and d ⴛ n matrix A. We seek the nonnegative x with the fewest nonzeros satisfying y ⴝ Ax. In general, this problem is NP-hard. However, for many matrices A there is a threshold phenomenon: if the sparsest solution is sufficiently sparse, it can be found by linear programming. We explain this by the theory of convex polytopes. Let aj denote the jth column of A, 1 < j ⱕ n, let a0 ⴝ 0 and P denote the convex hull of the aj. We say the polytope P is outwardly k-neighborly if every subset of k vertices not including 0 spans a face of P. We show that outward k-neighborliness is equivalent to the statement that, whenever y ⴝ Ax has a nonnegative solution with at most k nonzeros, it is the nonnegative solution to y ⴝ Ax having minimal sum. We also consider weak neighborliness, where the overwhelming majority of k-sets of ajs not containing 0 span a face of P. This implies that most nonnegative vectors x with k nonzeros are uniquely recoverable from y ⴝ Ax by linear programming. Numerous corollaries follow by invoking neighborliness results. For example, for most large n by 2n underdetermined systems having a solution with fewer nonzeros than roughly half the number of equations, the sparsest solution can be found by linear programming. neighborly polytopes 兩 cyclic polytopes 兩 combinatorial optimization 兩 convex hull of Gaussian samples 兩 positivity constraints in ill-posed problems

1. Introduction onsider an underdetermined system of linear equations y ⫽ Ax, where y 僆 Rd, x 僆 Rn, A is a d ⫻ n matrix, d ⬍ n, and y is considered known but x is unknown. In this article only nonnegative solutions x ⱖ 0 are of interest. Enthusiasts of parsimony seek the sparsest solution, the one with fewest nonzeros. Formally, they consider the optimization problem

C

共NP兲

min 储x储 0 subject to y ⫽ Ax,

x ⱖ 0.

Here the 0-norm 储x储0 counts the number of nonzeros. Because of the extreme nonconvexity of the zero-norm, (NP) is NP-hard in general. In this article, we consider the linear program 共LP兲

min 1⬘x subject to y ⫽ Ax,

x ⱖ 0.

We will show that for many matrices A, whenever the solution to (NP) is sufficiently sparse, it is also the unique solution of (LP). As a general label, we call this phenomenon NP兾LP equivalence. We develop an understanding of this equivalence phenomenon by using ideas from the theory of convex polytopes; the books of Gru ¨nbaum (1) and Ziegler (2) are useful starting points. Throughout the article, we study a specific polytope P, definable in several equivalent ways. Let T n⫺1 denote the standard simplex in Rn, i.e., the convex hull of the unit basis vectors ei. Let T n0 denote the solid simplex, i.e., the convex hull of T n⫺1 and the origin. We think of T n⫺1 as the outward part of T n0 , i.e., the part one would see looking from ‘‘outside.’’ We focus attention in this article on the convex polytope P ⫽ AT n0 傺 Rd. P also has a representation as the convex hull of a certain point set A 傺 Rd we refer to frequently. Specifically, let 9446 –9451 兩 PNAS 兩 July 5, 2005 兩 vol. 102 兩 no. 27

A consist of the columns aj, j ⫽ 1, . . . , n of A, possibly together with origin a0 ⫽ 0; include the origin if it does not already belong n to the convex hull of the {aj}j⫽1 . For later use, set N ⫽ #A. Thus, n N ⫽ n if 0 belongs to the convex hull of the {aj}j⫽1 , otherwise N ⫽ n⫺1 n ⫹ 1. Below, we use the notation T ⫽ T if N ⫽ n, and T ⫽ T n0 , if N ⫽ n ⫹ 1. Then also P ⫽ AT. A general polytope Q is called k-neighborly if every set of k vertices spans a face of Q. Thus, all combinations of vertices generate faces. The standard simplex T n⫺1 is the prototypical neighborly object. The terminology and basic notions in neighborliness were developed by Gale (3, 4); see also refs. 1, 2, and 5. We modify this notion here, calling a polytope Q that contains 0 outwardly k-neighborly if all sets of k vertices not including the origin 0 span a face. Roughly speaking, such a polytope behaves as a neighborly one except perhaps at any faces reaching the origin. Thus if Q is k-neighborly then it is also outwardly k-neighborly, but the notions are distinct. In addition outward k-neighborliness of AT n0 implies neighborliness of AT n⫺1, the outward part of AT n0 . Of course, when 0 僆 AT n⫺1 neighborliness and outwardly neighborliness of AT n0 coincide. [Modification of neighborliness to exclude consideration of certain subsets of vertices has been useful previously; compare the notion of central neighborliness of centrosymmetric polytopes, where every k vertices not including an antipodal pair span a face; see ref. 6 for discussion and references.] In Section 2 we connect outward neighborliness to the question of NP兾LP equivalence. Theorem 1. Let A be a d ⫻ n matrix, d ⬍ n. These two properties of A are equivalent: Y Y

The polytope P has N vertices and is outwardly k-neighborly. Whenever y ⫽ Ax has a nonnegative solution, x0 having at most k nonzeros, x0 is the unique solution to (LP).

Formalizing the notion of sparsity threshold of a matrix A, we see that LP兾NP equivalence holds up to a certain breakdown point; namely, the largest value m such that every sparse vector with fewer than m nonzeros is uniquely recovered by (LP). The highest value of k for which a polytope besides the simplex can be k-neighborly is d兾2 (1, 3, 4). Hence if n ⬎ d, equivalence breakdown must occur as soon as the number of nonzeros k ⫽ d兾2 ⫹ 1. 1.1. Neighborly Polytopes. A polytope is called neighborly if it is

k-neighborly for every k ⫽ 1, . . . , d兾2. Many families of neighborly polytopes are known. In Section 3, we use Theorem 1 and the existence of neighborly polytopes to give the following. Corollary 1.1. Let d ⬎ 2. For every n ⬎ d there is a d ⫻ n matrix A such that NP兾LP equivalence holds with breakdown point d兾2 ⫹ 1. When we have a matrix A with this property, and a particular system of equations that must be solved, we can run (LP); if we find that the output has fewer nonzeros than half the number of

Abbreviation: LP, linear program. *To whom correspondence should be addressed. E-mail: [email protected] © 2005 by The National Academy of Sciences of the USA

www.pnas.org兾cgi兾doi兾10.1073兾pnas.0502269102

Corollary 1.2. Let (0) be a nonnegative measure supported on some

subset of the n known points 0 ⬍ t1 ⬍ 䡠 䡠 䡠 ⬍ tn ⬍ 2. Let ˆ k denote the Fourier coefficient

ˆk ⬅

冘

兵tj其 exp兵 冑⫺1䡠kt j其.

j

Suppose that yk ⫽ ˆ k(0) is observed (without error) for k ⫽ 1, . . . , (0) m, 2m ⬍ n. If is supported on at most m points, the problem min

冘

兵t j其

subject to y k ⫽ ˆ k, k ⫽ 1, . . . , m;

j

兵t j其 ⱖ 0, j ⫽ 1, . . . , n, has (0) as its unique solution. Superficially, this problem seems improperly posed, since we have n unknowns, the mass of at each of the n points tj, with only 2m ⬍ n data ˆ k to constrain them. Yet if the underlying object (0) is sparsely supported, it is uniquely recoverable, in fact by convex optimization. This corollary was previously known to us; it follows from a result in ref. 8. It also follows from recent work by Fuchs (9). A parallel result can be given for partial Laplace transformation. Corollary 1.3. Let (0) be a nonnegative measure supported on some

subset of the n known points, ⫺⬁ ⬍ 1 ⬍ 䡠 䡠 䡠 ⬍ n ⬍ ⬁. Let ˜k denote the Laplace transform value

˜k ⬅

冘

兵j其 exp兵k j其.

j

Suppose that yk ⫽ ˜ k(0) is observed (without error) for k ⫽ 1, . . . , (0) m, m ⬍ n. If is supported on at most m兾2 points, the problem min

冘

兵t j其

subject to y k ⫽ ˜ k, k ⫽ 1, . . . , m;

j

兵 j其 ⱖ 0, j ⫽ 1, . . . , n has (0) as its unique solution. This problem again seems improperly posed, since we have n unknowns but only m ⬍ n (real) data. Yet if (0) is sparsely supported, it is uniquely recoverable, again by linear programming. These corollaries are proved here by using the neighborliness of polytopes based on certain partial Fourier and partial Vandermonde matrices, respectively. They also follow from recent work by Fuchs (9), who gave a direct proof of uniqueness. We find the neighborliness connection is instructive; it makes available a whole range of similar examples, provides knowledge about the atypicality of such examples, and builds a bridge to a body of distinguished literature, going back as far as Carathe´odory (10, 11). Donoho and Tanner

Table 1. Phase transitions N and VS in strong and weak neighborliness Phase transition

␦ ⫽ 0.1

␦ ⫽ 0.25

␦ ⫽ 0.5

␦ ⫽ 0.75

␦ ⫽ 0.9

N VS

0.060131 0.240841

0.087206 0.364970

0.133457 0.558121

0.198965 0.765796

0.266558 0.902596

1.2. Random Polytopes. When introducing the neighborliness

concept, Gale (3) suggested that ‘‘most’’ polytopes are neighborly. Recently, we (12) studied neighborliness of random polytopes, considering high-dimensional cases dn ⫽ ␦n, n large. We derived a function N such that polytopes P with n Gaussiandistributed vertices in Rd were roughly N(d兾n)䡠d-neighborly for large n. Thus, if n ⫽ 2d, we found N(d兾n) ⬇ 0.133; compare Table 1. Applying our results gives the following. Corollary 1.4. Fix ⬎ 0. Let Ad,n denote a random d ⫻ n matrix with

columns drawn independently from a multivariate normal distribution on Rd with nonsingular covariance matrix. Suppose d and n are proportionally related by dn ⫽ ␦n. Then, with overwhelming probability for large n, Ad,n offers the property of LP兾NP equivalence up to breakdown point ⱖ(N(␦) ⫺ )d. Line 1 of Table 1 gives results for different aspect ratios ␦ ⫽ d兾n of the nonsquare matrix A. Thus if n ⫽ 10d, so the corresponding system is underdetermined by a factor of 10, the typical matrix A with Gaussian columns offers LP兾NP equivalence up to a breakdown point exceeding 0.06d. For the typical A and for every problem instance y generated by a sparse vector x with nonzeros ⱕ0.06 times the number of equations, (LP) delivers the sparsest solution. 1.3. Weak Neighborliness and Weak Equivalence. The notion of NP兾LP equivalence developed in Theorem 1 demands, for a given A, equivalence at all problem instances (y, A) generated by any nonnegative sparse vector x0 with at most k nonzeros. A weaker notion considers equivalence merely for most such problem instances. This idea is developed in Section 4, where it is shown that for matrices A where the corresponding point set A is in general position NP兾LP equivalence at a certain instance y ⫽ Ax0 depends only on the support of x0 and not on the values of x0 for its support. Hence, we define a measure on problem instances by simply counting the fraction of support sets of size k with a given property. We then meaningfully speak of a given A offering NP兾LP equivalence for most problem instances having nonnegative sparse solutions with the most k nonzeros. We can also define two weaker notions of classical [respectively (resp.) outward] neighborliness, saying that the polytope P is (k, )-weakly neighborly (resp. weakly outwardly neighborly) if, among k-membered subsets of vertices (resp. those not including 0), all except a fraction span k ⫺ 1-faces of P. As it turns out, if the points A are in general position, weak neighborliness of P is the same thing as saying that P ⫽ AT has at least (1 ⫺ ) times as many (k ⫺ 1)-dimensional faces as T. Hence, the notion of weak neighborliness is really about numbers of faces. We say that a face is zerofree if 0 does not occur as a vertex. Theorem 2. Let A be a d ⫻ n matrix, d ⬍ n with point set A in general position. For 1 ⱕ k ⱕ d ⫺ 1, the following two properties of A are equivalent. Y

Y

The polytope P ⫽ AT has at least (1 ⫺ ) times as many zerofree (k ⫺ 1)-faces as T. Among all problem instances (y, A) generated by some nonnegative vector x0 with at most k nonzeros, the solutions to (NP) and (LP) are identical, except in a fraction ⱕ of instances. PNAS 兩 July 5, 2005 兩 vol. 102 兩 no. 27 兩 9447

APPLIED MATHEMATICS

equations, we infer that we have the unique sparsest nonnegative solution. For such matrices, if it would be very valuable to solve (NP), because the answer would be very sparse, we can solve it by convex optimization. Conversely, it is exactly in the cases where the answer to (NP) would not be very sparse that it might also be very expensive to compute! Examples of neighborly polytopes go back to Gale (3, 4) and Motzkin (7); some of these are reviewed in Section 3. They have interesting interpretations in terms of Fourier analysis and geometry of polynomials, and correspond to interesting matrices A. Section 3 shows how to apply them to get the above corollary and to get two results about inference in the presence of badly incomplete data. The first concerns incomplete Fourier information:

In recent work on high-dimensional random polytopes (12), we counted the faces of randomly projected simplices. Building on work of Affentranger and Schneider (13) and especially Vershik and Sporyshev (14) we considered the case where d and n are large and proportional and were able to get precise information about the phase transition between prevalence and scarcity of weak neighborliness as k increases from 1 to d ⫺ 1. We studied a function VS (in honor of Vershik and Sporyshev, who first implicitly characterized it) that maps out the phase transition in weak neighborliness. Fix ⬎ 0 and consider n large. Weak neighborliness typically holds for k ⬍ VS(d兾n)䡠d䡠(1 ⫺ ), whereas for k ⬎ VS(d兾n)䡠d䡠(1 ⫹ ), weak neighborliness typically fails. We also showed that the same conclusions hold for weak outward neighborliness as for weak neighborliness. Numerical results are given in Table 1, in particular, the second line, where VS(0.1) ⬇ 0.24. Informally, for most 10-fold underdetermined matrices A and most vectors with fewer nonzeros than 24% of the number of rows in A, the sparsest nonnegative solution can be found by (LP). In contrast, N(0.1) ⬇ 0.06. Informally, if for a typical matrix A we insist that every instance of (NP) with a sufficiently sparse solution be solvable by (LP), then sufficiently sparse must mean at most 6% d. As a corollary, we obtain the following. Let S⫹(d, n, k) denote the collection of all systems of equations (y, A) having a nonnegative solution x0 with at most k nonzeros. When A is a matrix with columns in general position, equivalence between (NP) and (LP) depends only on the support of x0, as discussed in Lemma 4.2. Place a probability measure on S⫹(d, n, k), which makes the nullspace of A uniformly distributed among n ⫺ d subspaces of Rn and makes the support of the sparsest solution uniform on k-subsets of n objects. Using Table 1’s entry showing VS(1兾2) ⬎ 0.558, we have the following. Corollary 1.5. Consider the systems of equations (y, A) in S⫹(n, 2n, 0.558n). For n large, the overwhelming majority of such (y, A) pairs exhibit NP兾LP equivalence. 1.4. Contents. Section 2 proves Theorem 1, and Section 3 explains how Corollaries 1.1, 1.2, 1.3, and 1.4 follow from Theorem 1 and existing results in polytope theory. Section 4 studies weak neighborliness and justifies Corollary 1.5. Section 5, which is published as Supporting Text in the supporting information on the PNAS web site, discusses (LP) in settings not neighborly in the usual sense, extensions to noisy data, and extensions to situations when nonnegativity is not enforced. Positivity is seen to be a powerful constraint.

2. Equivalence 2.1. Preliminaries. To begin, we relate (LP) to the polytope P. Note that the value of (LP) is a function of y 僆 Rd:

V共 y兲 ⬅ val共LP兲 ⫽ inf 1⬘x subject to y ⫽ Ax,

x ⱖ 0.

Note also that V is homogeneous: V(ay) ⫽ aV(y), a ⬎ 0.We have defined the polytope P ⫽ AT so that it is simply the ‘‘unit ball’’ for V: n P ⫽ 兵y : y 僆 AR⫹ and V共y兲 ⱕ 1其.

To see this, write conv for the convex hull operation. The convexity and homogeneity of V guarantees that the right side is conv({0} 艛 n n ). We have defined P by cases; if 0 僆 conv({aj}j⫽1 ), P ⫽ {aj}j⫽1 n ). AT n⫺1; otherwise, P ⫽ AT n0. In each case P ⫽ conv({0} 艛 {aj}j⫽1 We call subconvex combination a linear combination with nonnegative combinations summing to at most one. The previous paragraph can be reformulated as follows. 9448 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0502269102

Lemma 2.1. Consider the problem of representing y 僆 Rd as a subconvex combination of the columns (a1, . . . , an). This problem has a solution if and only if val(LP) ⱕ 1. If this problem has a unique solution then (LP) has a unique solution for this y. We adopt standard notation concerning convex polytopes; see ref. 1 for more details. In discussing the (closed, convex) polytope P, we commonly refer to its vertices v 僆 vert(P) and k-dimensional faces F 僆 Fk(P). v 僆 P will be called a vertex of P if there is a linear functional v separating v from P{v}, i.e., a value c so that v(v) ⫽ c and v(x) ⬍ c for x 僆 P, x ⫽ c. Thus P ⫽ conv(vert(P)).Vertices are just 0-dimensional faces, and a k-dimensional face of P is a k-dimensional set F 傺 P for which there exists a separating linear functional F, so that F(x) ⫽ c, x 僆 F, and F(x) ⬍ c, x 僆 F. Faces are convex polytopes, each one representable as the convex hullof a subset vert(F) 傺 vert(P); thus if F is a face, F ⫽ conv(vert(F)). A k-dimensional face will be called a k-simplex if it has k ⫹ 1 vertices. Important for us will be the fact that for k-neighborly polytopes all of the low-dimensional faces are simplices. It is standard to define the face numbers fk(P) ⫽ #Fk(P).We also need the simple observation that

vert共AT兲 傺 A vert共T兲,

[2.1]

which implies Fᐉ共AT兲 傺 AFᐉ共T兲,

0 ⱕ ᐉ ⬍ d;

[2.2]

and so the numbers of vertices obey f0共AT兲 ⱕ f0共T兲.

[2.3]

2.2. Basic Insights. Theorem 1 involves two insights recorded here

without proof. Similar lemmas were recently proven in ref. 6. The first explains the importance and convenience of having simplicial faces of P. Lemma 2.2 (Unique Representation). Consider a k-face F 僆 Fk(P) and suppose that F is a k-simplex. Let x 僆 F. Then

(i) x has a unique representation as a convex combination of vertices of P. (ii) This representation places nonzero weight only on vertices of F. Conversely, suppose that F is a k-dimensional closed convex subset of P with properties i and ii for every x 僆 F. Then F is a k-simplex and a k-face of P. The second insight is that outward k-neighborliness can be thought of as saying that the low-dimensional zerofree faces of P are simply images under A of the faces of T n⫺1, and hence simplices. Lemma 2.3 (Alternate Form of Neighborliness). Suppose the polytope P ⫽ AT has N vertices and is outwardly k-neighborly. Then

᭙ ᐉ ⫽ 0, . . . , k ⫺ 1,

᭙ F 僆 Fᐉ共T n⫺1兲, AF 僆 Fᐉ共AT兲. [2.4]

Conversely, suppose that Eq. 2.4 holds; then P ⫽ AT has N vertices and is outwardly k-neighborly. 2.3. Theorem 1, Forward Direction. We suppose that P is outwardly k-neighborly, that the nonnegative vector x0 has at most k nonzeros, and show that the unique solution of (LP) is precisely x0. We assume without loss of generality that the problem is scaled so that 1⬘x0 ⫽ 1; thus x0 僆 T n⫺1. Now, since x0 has at most k nonzeros, it belongs to a k ⫺ 1-dimensional face F of the simplex: F 僆 Fk⫺1(T n⫺1). Hence y belongs to AF, which, by outward neighborliness and Lemma 2.2, Donoho and Tanner

1⬘x ⱕ 1. Hence it is the unique solution of (LP). 2.4. Theorem 1, Converse Direction. By hypothesis, A has the property that, for every y ⫽ Ax0, where x0 has no more than k nonzeros, x0 is the unique solution to the instance of (LP) generated by y. We will show that P has N vertices and is outwardly k-neighborly. By considering the case k ⫽ 1 with every xi ⫽ ei, we learn that in each case the corresponding yi ⫽ Axi belongs to P and is n uniquely representable among subconvex combinations of (aj)j⫽1 simply by ai. This implies by Lemma 2.2 that each yi is a vertex n , 0 is also of P, so P has at least n vertices. Now if兾 conv{aj}j⫽1 a vertex of P. Since by Eq. 2.3 the number of vertices of P ⫽ AT is at most the number of vertices of T, we see that P has exactly N vertices. Consider now k ⬎ 1, and a collection of k disjoint indices i1, . . . , ik, 1 ⱕ iᐉ ⱕ n. By hypothesis, for every x0 of the form

冘 k

x0 ⫽

␣ ᐉ e i ᐉ,

ᐉ⫽1

with ␣ᐉ ⱖ 0 and ¥ᐉ␣ᐉ ⫽ 1, the corresponding problem (LP) based on y ⫽ Ax0 has a unique solution, equal to x0. Since this latter problem has a unique solution, there is (by Lemma 2.1) a unique solution to the problem of representing each such y as a subconvex combination of columns of A, and that solution is provided by the corresponding x0. All of the x0 under consideration populate a face F of T n⫺1, determined by i1, . . . , ik. By the converse part of Lemma 2.2, AF is a face in Fk⫺1(AT). Combining the last two paragraphs with the converse part of Lemma 2.3, we conclude that P has N vertices and is outwardly k-neighborly. 3. Corollaries We first mention a standard fact about convex polytopes (ref. 3 and see chapter 7 in ref. 1). Theorem 3.1. For every n ⬎ d ⬎ 1 there are d兾2-neighborly polytopes in Rd with n vertices Examples are provided by the cyclic polytopes, which come in two standard families: Y

Moment curve cyclic polytopes: Let 0 ⱕ t1 ⬍ 䡠 䡠 䡠 ⬍ tn ⬍ ⬁, and let the jth column of the d ⫻ n matrix A be given by aj ⫽ M共tj兲,

j ⫽ 1, . . . , n,

where M: R⫹ 哫 Rd is the so-called moment curve M共t兲 ⫽ 共t, t2, . . . , td兲T. n is The polytope obtained from the convex hull of the (aj)j⫽1 d兾2 neighborly; see Gale (4). Note that A is a kind of nonsquare Vandermonde matrix. Y

Trigonometric cyclic polytopes: Let 0 ⬍ t1⬍ 䡠 䡠 䡠 ⬍ tn ⬍ 2, and, for d ⫽ 2m, let the jth column of the d ⫻ n matrix A be given by aj ⫽ F(tj), where F : [0, 2) 哫 Rd is the trigonometric moment curve

Donoho and Tanner

F共t兲 ⫽ 共cos共t兲, sin共t兲, cos共2t兲, sin共2t兲, . . . , cos共共d兾2兲t兲, sin共共d兾2兲t兲兲 T. n is The polytope obtained from the convex hull of the (aj)j⫽1 d兾2-neighborly, again see ref. 4. Note that A is a kind of nonsquare Fourier matrix.

Existing proofs of neighborliness of moment curve polytopes (1, 4), after a simple adaptation, give Corollary 1.1. Given a sequence (tj) with t1 ⫽ 0 the polytope conv{M(tj)} is d兾2neighborly; since M(0) ⫽ 0, it follows that, for any sequence of n (tj), P ⫽ conv({0} 艛 {M(tj)}j⫽1 ) is d兾2 neighborly. Hence P ⫽ n conv({0} 艛 {M(tj)}j⫽1 ) is outwardly neighborly. Hence, defining the matrix A ⫽ [M(t1), . . . , M(tn)], we get (LP)-(NP)-equivalence up to breakdown point d兾2 ⫹ 1. Corollary 1.1 follows. Corollary 1.3 also follows from the outward-neighborliness of P ⫽ conv({0} 艛 {M(tj)}j). Let yk ⫽ ˜ k(0). Represent (0) by a vector x0 with n entries, the jth one representing (0){j}. Define tj ⫽ exp(j), j ⫽ 1, . . . , n, and note that y ⫽ Ax0, where A is the partial Vandermonde matrix associated with the moment curves above. Since the polytope associated to A is d兾2-outwardly neighborly, if the measure (0) is supported in no more than d兾2 points, it is uniquely recovered from data y by solving (LP). To obtain Corollary 1.2, we first adapt the proof of the neighborliness of trigonometriccyclic polytopes to find that every polytope conv({0} 艛 {F(tj)}) is outwardly d兾2-neighborly. Details are given in the appendix of ref. 15. Applying this, we can obtain Corollary 1.2. Break the m observed complex data into real parts and imaginary parts, giving a vector y of length d ⫽ 2m. Since (0) is a nonnegative measure supported at 0 ⬍ t1 ⬍ 䡠 䡠 䡠 ⬍ tn ⬍ 2, represent it as a vector x0 with j-entry (0){tj}. The data y are related to the vector x0 through y ⫽ Ax0, where A is the above partial Fourier matrix. The corresponding polytope is outwardly neighborly. Hence, if the nonnegative vector x0 has no more than m ⫽ d兾2 nonzeros, it will be uniquely reconstructed (despite n ⬎ d) from the data y by (LP). (As stated earlier, Corollary 1.2 also follows from theorem 3 in ref. 8; in fact, the underlying calculation in the proof of theorem 3 in ref. 8 can be seen to be the same as the ‘‘usual’’ one in proving neighborliness of trigonometric cyclic polytopes, although at the time of ref. 8 this connection was not known.) After this article was originally submitted, we learned about work by Jean-Jacques Fuchs (9) also implying Corollaries 1.2–1.3. A wide range of neighborly polytopes is known. A standard technique (already used in the two examples above) is to take n points on a curve C⬊R 哫 Rd (7, 16). The curve must be a so-called curve of order d, meaning that each hyperplane of Rd intersects the curve in at most d points. This construction is, of course, intimately connected with the theory of moment spaces and with unicity of measures having specified moments (17). Constructions based on oriented matroids and totally positive matrices have also been made by Sturmfels (18, 19). In the context of this article, we note that if such a curve passes through the origin, then, of course, conv({0}艛{C(tj)}) is neighborly, and so outwardly neighborly as well. However, as the trigonometric moment curve shows, outward neighborliness is possible even when such a curve does not pass through the origin. Sturmfels (18, 19) has shown that (for even d) in some sense curves of order d offer the only example of neighborly polytopes (up to isomorphism). In short, it is known that polytopes offering full d兾2-neighborliness are special. What is the generic situation? Gale (3) proposed that in some sense most polytopes are neighborly. Goodman and Pollack proposed a natural model of random polytope in dimension d with n vertices (see ref. 13). They suggested taking the standard simplex T n⫺1 and apply a uniformly distributed random projection, getting the random polytope AT n⫺1. Vershik and Sporyshev PNAS 兩 July 5, 2005 兩 vol. 102 兩 no. 27 兩 9449

APPLIED MATHEMATICS

is a k ⫺ 1-dimensional face of P. Now, by Lemma 2.2, y has a unique representation by the vertices of P, which is a representation by the vertices of AF only, and which is unique. But x0 already provides such a representation. It follows that x0 is the unique representation for y obeying

(14) considered this question in the case where d and n increase to ⬁ together in a proportional way. In ref. 12 we revisit the Vershik-Sporyshev model, asking about neighborliness of the resulting high-dimensional random polytopes. It proves the following. Theorem 3.2. Let 0 ⬍ ␦ ⬍ 1, let n tend to infinity along with d ⫽ dn ⫽ ␦n, and let A ⫽ Ad,n be a random d ⫻ n orthogonal projection. There is N(␦) ⬎ 0 so that, for ⬍ N(␦), with overwhelming probability for large n, AT n⫺1 is d-neighborly. Thus, typical Goodman-Pollack polytopes have neighborliness proportional to dimension. (This result permits, but does not imply, that polytopes are not fully neighborly; i.e., the fact that N ⬍ 0.5 allows the possibility that k-neighborliness may not hold up to the upper limit k ⫽ d兾2. The lack of full neighborliness for ␦ ⬍ 0.42 can be inferred from the lack of d兾2-weak neighborliness described below.) The Goodman-Pollack model is broader than it first appears. By a result of Baryshnikov and Vitale (20), P is affinely equivalent to the convex hull of a Gaussian random sample. We can conclude the following. Corollary 3.1. Let A ⫽ Ad,n denote a random d ⫻ n matrix with

columns aj, j ⫽ 1, . . . , n drawn independently from a multivariate normal distribution on Rd with nonsingular covariance. Suppose d and n are proportionally related by dn ⫽ ␦n. Let ⬍ N(␦). Then, n is dwith overwhelming probability for large n, conv{aj}j⫽1 neighborly. Ref. 12 implies that the preceding two results hold just as written also for P ⫽ AT n0 , and conv({0}艛{aj}), respectively. Corollary 1.4 follows. 4. Weak Neighborliness and Probabilistic Equivalence 4.1. Individual Equivalence and General Position. We say there is individual equivalence (between NP and LP) at a specific x0 when, for that x0, the result y ⫽ Ax0 generates instances of (NP) and (LP) that both have x0 as the unique solution. In such a case we say that x0 is a point of individual equivalence.

For general A the task of describing such points may be very complicated; we adopt a simplifying assumption. Recall the n definition of A: Let a0 ⫽ 0 and, if 0 ⰻ conv{aj}j⫽1 , let A ⫽ {aj}n0 . Otherwise let A ⫽ {aj}n1 . We say that A is in general position in Rd if no k-plane of Rd contains more than k ⫹ 1 ajs (i.e., viewing the aj as points of Rd). Under this assumption, the face structure of P is very easy to describe. A remark in ref. 20 (compare ref. 6) proves the following. Lemma 4.1. Suppose that A is in general position. Then for k ⱕ d ⫺ 1, the k-dimensional faces of P ⫽ conv(A) are all simplicial. Recalling Lemma 2.2, it follows that, when A is in general position, whenever y belongs to a k-dimensional face of P with k ⱕ d ⫺ 1, there is a corresponding unique solution of (LP). This remains true for every y in that same face of P, and the unique solution involves a convex combination of the vertices of that same face. The vertices are identified with members of A. Those members are identified either with the origin or certain canonical unit basis vectors of Rn. Hence, the collection of such convex combinations of vertices is in one-to-one correspondence with points in a specific k-face of T. Moreover, by the uniqueness in Lemma 2.2, a k-face of T can arise in this way in association with only one k-face of P. Hence for k ⱕ d ⫺ 1, we have a bijection between k-faces of P, and a subset Sk of the k-faces of T. We think of Sk as the subset of k-faces of T destined to survive as faces under the projection T 哫 AT onto Rd. The k-faces of T are in bijection with the supports of the vectors belonging to those faces. Since two vectors x0 and x1 with unit sum and common support belong to the same face of T, and since each face as a whole survives or does not survive projection, we conclude the following. Lemma 4.2. Suppose that A is in general position and that x0 has at

most d ⫺ 1 nonzeros. The property of individual equivalence depends only on the support of x0; if x0 and x1 have nonzeros in the same positions, then they are either both points of individual equivalence or neither points of individual equivalence. There are, of course, (kn) supports of size k. This gives us a natural way to measure ‘‘typicality’’ of individual equivalence.

Fig. 1. Empirical verification of the NP兾LP equivalence phase transition as a function of ␦ with dn ⫽ ␦n and sparsity k ⫽ dn in the case of n ⫽ 200; weak neighborliness transition VS(red). The fraction of successes in (LP) recovering (NP) is in grayscale, and the calculated weak neighborliness transition curve VS(␦) is overlaid in red. Note that weak neighborliness exceeds d兾2 for ␦ ⬎ 0.425; see subsection 4.3. 9450 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0502269102

Donoho and Tanner

the sparsest nonnegative representation of y and also the unique nonnegative representation of y. This is a quite simple and, it seems, surprising outcome from mere face counting. 4.3. Interpreting Table 1. Our work in ref. 12 derives numerical

information about the Vershik-Sporyshev phase transition VS(␦) ⬎ 0, i.e., the transition so that for ⬍ VS(␦), the d-dimensional face numbers of AT n⫺1 are the same as those of T to within a factor (1 ⫹ oP (1)), whereas for ⬎ VS(␦) they differ by more than a factor (1 ⫹ oP(1)). We show that the same conclusion holds for the zerofree face numbers of AT n0 . Obviously N(␦) ⱕ VS(␦). Fixing some small ⬎ 0, we have with overwhelming probability for large d that P ⫽ AT is 共 ˜ N䡠d兲-outwardly neighborly, and

4.2. Individual Equivalence and Face Numbers. We are now in a

position to prove Theorem 2 by using the above lemmas. For a polytope Q possibly containing 0 as a vertex, ˜fk(Q) denote the number of zerofree k-faces, i.e., the number of faces of Q not having 0 as a vertex. Restating Theorem 2 in the terminology of this section we have the following. Theorem 4. Let A be in general position. The following statements are equivalent for k ⬍ d. Y

The zerofree face numbers of AT and T agree within a factor 1 ⫺ : 共1 ⫺ 兲f˜k⫺1共T兲 ⱕ ˜fk⫺1共AT兲 ⱕ ˜fk⫺1共T兲.

Y

共 ˜ VS䡠d , 兲-weakly outwardly neighborly; here ˜ N ⬅ N( ␦ ) ⫺ , and ˜ VS ⬅ VS( ␦ ) ⫺ obey 0 ⬍ ˜N ⬇ N共␦兲 ⬍ ˜VS ⬇ VS共␦兲. Some numerical information is provided in Table 1. Two key points emerge: Y

Y

A fraction ⱖ(1 ⫺ ) of all vectors with k nonzeros are points of individual equivalence.

Proof: A given support of size k corresponds uniquely to a k ⫺ 1 face F of T n⫺1. Individual equivalence at the given support occurs if and only if AF is a face of P. By Eq. 2.2, the zerofree faces of P are a subset of the images AF where F is a face of Tn⫺1. Hence the identity

#共supports giving equivalence兲 ˜f k⫺1共AT兲 ⫽ . ˜f k⫺1共T兲 #共supports of size k兲 Of course, counting faces of polytopes is an old story. This result points to a perhaps surprising probabilistic interpretation. Suppose the points in A are in general position. We randomly choose a nonnegative vector x with k ⬍ d nonzeros in such a way that all arrangements of the nonzeros are equally likely; the distribution of the amplitudes of the nonzeros can be arbitrary. We then generate y ⫽ Ax. If the quotient polytope P has 99% as many (k ⫺ 1)-faces as T, then there is a 99% chance that x is both 1. Gru ¨nbaum, B. (2003) Convex Polytopes, Graduate Texts in Mathematics (Springer, New York), 2nd Ed., Vol. 221. 2. Ziegler, G. M. (1995) Lectures on Polytopes, Graduate Texts in Mathematics (Springer, New York), Vol. 152. 3. Gale, D. (1956) in Linear Inequalities and Related Systems, Annals of Mathematics Studies (Princeton Univ. Press, Princeton), No. 38, pp. 255–263. 4. Gale, D. (1963) Proc. Sympos. Pure Math. VII, 225–232. 5. McMullen, P. & Shephard, G. C. (1971) Convex Polytopes and the Upper Bound Conjecture (Cambridge Univ. Press, Cambridge, U.K.). 6. Donoho, D. L. (2005) Neighborly Polytopes and the Sparse Solution of Underdetermined Systems of Linear Equations (Department of Statistics, Stanford Univ., Stanford, CA), Technical Report 2005-04. 7. Motzkin, T. S. (1957) Bull. Am. Math. Soc. 63, 35. 8. Donoho, D. L., Johnstone, I. M., Hoch, J. C. & Stern, A. S. (1992) J. R. Stat. Soc. Ser. B 54, 41–81. 9. Fuchs, J.-J. (2005) in Proceedings of the ICASSP, March 2005, Philadelphia, PA, ed. Bystrom, M. (IEEE Signal Processing Society, Piscataway, NJ), pp. 729–732.

Donoho and Tanner

N, the smaller, is still fairly large, perhaps surprisingly so. While it tends to zero as ␦ 3 0, it does so only at a rate O(1兾log(1兾␦)); and for moderate ␦ it is on the other of 0.1. VS is substantially larger than N. The fact that it ‘‘crosses the line’’ ⫽ 1兾2 for ␦ near 0.425 is noteworthy; this means that whereas a polytope can only be d兾2 neighborly, it can be ⬎d兾2 weakly neighborly! In fact, we know VS(␦) 3 1 as ␦ 3 1 (12, 14). For ⬎ 0 and ␦ sufficiently close to 1, for sufficiently large d, typical weak neighborliness can exceed d(1 ⫺ )! This is an important difference between neighborliness and weak neighborliness and is the source of Corollary 1.5.

For a discussion of further implications of these results and relationships to other work, see Supporting Text and also ref. 15. D.L.D. thanks the Mathematical Sciences Research Institute (Berkeley, CA) for its neighborly hospitality in the winter of 2005, while this article was prepared, and the Clay Mathematics Institute for a Senior Scholar appointment. J.T. thanks the Oxford University Computing Laboratory (Oxford) for generous accommodations while portions of this article were prepared. D.L.D. had partial support from National Science Foundation Grants DMS 00-77261 and 01-40698 (Focused Research Group), the National Institutes of Health, and the Office of Naval Research–Multidisciplinary University Research Initiative. J.T. was supported by National Science Foundation Fellowship DMS 04-03041. 10. 11. 12. 13. 14. 15.

16. 17. 18. 19. 20.

Carathe´odory, C. (1907) Math. Ann. 64, 95–115. Carathe´odory, C. (1911) Rend. Circ. Mat. Palermo 32, 193–217. Donoho, D. L. & Tanner, J. (2005) Proc. Natl. Acad. Sci. USA 102, 9452–9457. Affentranger, F. & Schneider, R. (1992) Discrete Comput. Geom. 7, 219 –226. Vershik, A. M. & Sporyshev, P. V. (1992) Selecta Math. Soviet. 11, 181–201. Donoho, D. L. & Tanner, J. (2005) Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming (Department of Statistics, Stanford Univ., Stanford, CA), Technical Report 2005-06. Derry, D. (1956) Canad. J. Math. 8, 383–388. Karlin, S. & Shapley, L. S. (1953) Geometry of Moment Spaces, Memoirs of AMS (Am. Math. Soc., Providence, RI), Vol. 21. Sturmfels, B. (1988) Linear Algebra Appl. 107, 275–281. Sturmfels, B. (1988) Eur. J. Combin. 9, 537–546. Baryshnikov, Y. M. & Vitale, R. A. (1994) Discrete Comput. Geom. 11, 141–147.

PNAS 兩 July 5, 2005 兩 vol. 102 兩 no. 27 兩 9451

APPLIED MATHEMATICS

Definition: Given a d ⫻ n matrix A, we say that a fraction ⱖ(1 ⫺ ) of all vectors x with k nonzeros are points of individual equivalence if individual equivalence holds for a fraction ⱖ(1 ⫺ ) of all supports of size k. A practical computer experiment can be conducted to approximate for a given A and k. One randomly generates a sparse vector x0 with randomly chosen support and arbitrary positive values on the support. One forms y ⫽ Ax0 and solves (LP). Then one checks whether the solution of (LP) is again x0. (A, k) can be estimated by the fraction of computer experiments where failure occurs. Experiments of this kind reveal that for A a typical random d ⫻ 2d orthoprojector, individual equivalence is typical for k ⬍ 0.558d. See Fig. 1, which shows that the experimental outcomes track well the prediction VS.