josephzizys + proofs   13

The Meaning of Omega
And the meaning of the recently-broken metric TSP approximation ratio

Andrew Stothers wrote in 2007 a first-year qualifying report for the University of Edinburgh PhD programme. In only eleven pages he presented the parts of the famous paper by Don Coppersmith and Shmuel Winograd (CW) that do enough to obtain for the exponent of matrix multiplication. In a few more pages he surveyed recent work on sufficient conditions for . Page 15, the last except for references, laid out “Possible Directions for Future Work” in five sentences, the fifth being:

Finally, it will also be possible to investigate bilinear algorithms further and use more complex means to determine whether or not “better” algorithms exist for matrix multiplication.

Today we wish to explain what means, why lower is always “better,” why we feel our field is better for the great efforts expended by Stothers and Virginia Vassilevska Williams, and why it is better when intellectual values guide our judgment. We also wish to defend ourselves and others for calling the results “breakthroughs,” comparing and contrasting to recent advances on algorithms for the Traveling Salesman Problem.

I (Ken) wrote a similar first-year qualifier in summer 1982 after my first year at Oxford—they are a staple of leading British universities. Like Andrew’s it was not obligated to have original research, just to demonstrate understanding and capability. One item found its way into a conference paper, but my 1986 dissertation ranged into various other topics.

What distinguishes Andrew’s Nov. 2010 dissertation is its singular focus on accomplishing the objective of his proposal. And doing it. The resulting paper with his advisor Sandy Davie has recently been submitted. Meanwhile in North America, Virginia had been thinking about the same problem since 2005. It may not be true as asserted here that “hundreds of people tried to improve” it after 1987, but we’ll guess about as many people thought about it. It’s enough to note that many researchers felt it worthy of pursuit, including Andrew’s advisor, and that progress took large concerted effort. The nature of this effort—including the use of computers—may be an important harbinger for science by itself.

What Does ω Mean?

The definition of is the infimum of all such that matrix multiplication has an algorithm that runs in time . This is in a classical model of computation where addition and multiplication of scalars have unit cost. Note that there might not be an algorithm that runs in time itself— may properly be a limit even if it turns out to have the least possible value 2. Currently all we know is that from Virginia’s work.

As many have noted, does not have immediate practical relevance because there is evidently a tradeoff with the constant hiding under the in . The tradeoff is so far steep enough that apparently only Volker Strassen’s relatively simple algorithm achieving an exponent of has wide use. So why do we care about values of less than that?

One of the central questions in physics is, why do the fundamental constants of nature have the values they do? The desire for why, in the face of evidence that the values could be arbitrary, has aroused such passion that string theorist David Gross channeled Winston Churchill in exhorting young physicists to “never, never, never give up.”

Now in mathematics it may seem nonsensical to ask, why do numbers like and have the values they do? But with , we are in a sense closer to physics: understanding it is akin to discovering a natural law, at least a law of information. And quantum mechanics has if anything magnified William Hamilton’s vision approaching 200 years ago that Nature computes with matrices, yes large ones. It is possible that detracts from a better statement of important laws, but knowing better brings us closer to them all the same.

Barriers and Breakthroughs

Hence also the interest in whether takes a value with known meaning, such as or (both falsified) or , or as some have mused, or , or as some have argued more likely for exponents, a logarithm of some higher and simpler number. Strassen himself has often stated his belief that is strictly greater than . If it has a value not previously seen in mathematics, then we could hope to discover new mathematical regularities as well; if it has a known value, then this may yield some more explanation about algorithms.

The value seemed a natural possible barrier, but the time interval from to in 1981–82 was very short. By contrast the “CW” bound of was not a natural-seeming number, but it withstood attempts to scale it for 23 years, including a year’s work from each of Virginia and Andrew. Reasons of intellectual judgment, expanded below, we feel are enough to justify calling their final results a “breakthrough.” But we offer a recent concrete case for comparison first.

The Salesman Always Rings 1.5

The Traveling Salesman Problem (TSP) seeks the shortest route to visit each of a given set of sites, coming back in a ring to the starting site. When the sites are in real space and “shortest” refers to Euclidean distance, finding the length of the absolute shortest route remains -hard even in the plane, as discussed here. However, Nicos Christofides in 1976 found a polynomial-time algorithm that finds a tour guaranteed to have total length at most . The method and proof apply to various other cases where the distances between sites obey inequalities that define some kind of metric, with the same bound.

This stood as a barrier for most of these cases, until this year. Shayan Oveis Gharan, Amit Saberi, and Mohit Singh (OSS) broke it for a wide subclass of these problems. Well they “broke” it by achieving a guaranteed ratio to the optimum of

Actually their original paper did not give a value—it merely proved the existence of an giving . The estimate for the magnitude of their breakthrough was supplied later.

Improvements and Predictions

It must be said that the OSS paper and its predecessor introduced techniques that were more novel to the problem than is evident for the new matrix-multiplication papers, and the predecessor won the SODA 2010 “Best Paper” award. However, a referee could have wondered, why bother moving heaven and earth in a restricted case to demonstrate an increment a thousand times smaller than “nano”? Indeed the version of OSS linked above still ends with “” on page 65.

What happened is that the change attracted interest on several continents, and the people in Europe who made a bankable improvement are not even in the euro zone. First Tobias Mömke and Ola Svensson of KTH in Stockholm obtained by using matchings in place of the more-complicated ingredients of OSS, also in time for FOCS 2011. Then Marcin Mucha of the University of Warsaw improved their analysis to obtain

It is noted by all that an even older paper by Michael Held and Richard Karp from 1970 had set a believable target for provable improvements of Christofides’ bound, namely a conjectured ratio of from linear programming relaxations. However, also has dueling conjectures, some supporting a believable target of . The real point of comparison with TSP is that suddenly there was progress on a barrier that had stood for much of the age of the field, and this attracted others to try for more.

We note that Markus Bläser has contended in a comment that the extension of CW used by both new papers has limitations, and we infer that some other experts concur. However, the paper by Vassilevska Williams in particular has two new ingredients: a framework for managing any tensor power as a base algorithm, and a computer implementation of the framework. She also employed an insight of Stothers to simplify the analysis. Though we can imagine a geometrical insight that higher powers bring returns that “shrink geometrically,” can we really constrain the likelihood that pursuing them might dislodge a different insight that changes the whole game? Is there a limitation theorem here? All we know is that the game has changed, and perhaps the game is afoot.

Worth and Values

The last issue we wish to raise is the reliability of judging the worth of past achievement by present assessment of what it may or may not lead to in the future. We have posted several times on surprises and the difficulty of predicting or guessing how things will go. Of course impetus into the future is necessarily part of claiming something now a “breakthrough.”

However, if we seek a reliable and consistent standard of judging worth, we should use a value that the community has understood for many years: intellectual substance and effort. This value, plus the simple salience of the goal, led our principals to invest the years of work in the first place. Depth of thinking is our gold standard, while applicability is our paper currency. The improvements of and on paper for amid other issues may be a poor return now, but the new computer-assisted vein to mine may parlay true value later.

Dick and I hope this explains why fundamental effort and easily-stated achievement, on a fundamental problem after nearly a quarter-century of stasis, elicited the reaction it has. We agree with, and have tried to extend, Timothy Gowers’ comments here. As for how these values can be invested, only time will be the teller.

Open Problems

Can we be more open about the value of pursuing problems?

What is ?

Suppose we know something special about two matrices and to be multiplied, something not so obvious like their being sparse. Suppose in particular that we have a tiny circuit that for any outputs the value , and similarly for , where those values come from a fixed finite set. Or suppose we know that and/or preserve a similar succinctness property from argument vectors to values—which[…]
All_Posts  Ideas  News  Open_Problems  People  Proofs  Results  Algorithms  approximation  matrix  multiplication  omega  TSP  from google
december 2011 by josephzizys
A Breakthrough On Matrix Product
Beating Coppersmith-Winograd

Virginia Vassilevska Williams is a theoretical computer scientist who has worked on a number of problems, including a very neat question about cheating in the setup of tournaments, and a bunch of novel papers exemplified by this one on graph and matrix problems with runtime exponents of that have long been begging to be improved.

Today Ken and I want to discuss her latest breakthrough in improving our understanding of the matrix multiplication problem.

Of course Volker Strassen first showed in his famous paper in 1969 that the obvious cubic algorithm was suboptimal. Ever since then progress has been measured with one parameter : if your algorithm runs in time , then you are known by this one number. Strassen got , and the race was off. A long series of improvements started to happen, which for a while seemed to be stuck above . Then, Don Coppersmith and Shmuel Winograd (CW) got and everything changed. After a contribution by Strassen himself, CW finally obtained in 1987, with full details here. This has been the best known for decades.

This has all changed now. Virginia has proved that she can beat the “barrier” of CW and get a new lower value for . Currently her paper gives , an improvement of “only” , but there is promise of more. This is also another case of proofs coming in twos, as a theorem in PhD thesis work by Andrew Stothers was circulated to some in June 2010 but not very widely. All this is extremely exciting, and is one of the best results proved in years in all of theory. While these algorithms are unlikely to be usable in practice, they help shed light on one of the basic questions of complexity theory: how fast can we multiply matrices? What could be more fundamental than that?

The Basic Idea

Matrix multiplication is bi-linear: the formula for the entry of is . The first step in simplifying the problem is to make it more complicated: Let us have indicator variables and compute instead the tri-linear form

This is a special case of a general tri-linear form

where and we have re-mapped the indices. It looks like we have made order-of work for ourselves. The key, however, is to try to fit a representation of the form:

where . The point is, suppose we can compute these products in total time . Then we can compute (the coefficients for) all the desired entries

in steps. Thus what we have are separate handles on the time for the products and the time for the . The way to manage and balance these times involves recursion.

The Basis Idea

The recursion idea is nice to picture for matrices, though its implementation for the way we have unrolled matrices into vectors is not so nice. Picture and as each being matrices. We can regard instead as a matrix of four matrices , and do the same for . Then the product can be written via products , and we can picture ourselves recursing on these products.

The reason why the vector case does not look so nice is that the tri-linear form is so general—indeed we cannot expect to fit a general tri-linear form into a small number of products . What CW did, building on work by Arnold Schönhage, is relax the tri-linear form by introducing more than -many variables, supplying appropriate coefficients to set up the recursion, and most of all framing a strategy for setting variables to zero so that three goals are met: the recursion is furthered, the values of “” at each level stay relatively small, and the matrix product can be extracted from the variables left over. This involved a hashing scheme which used subsets of integers that are free of arithmetical progressions.

The final step by CW was to choose a starting algorithm for the basis case of the recursion. They devised one and got . Then they noticed that if they bumped up the base case by manually expanding their algorithm to an handling the next-higher case, they got a better analysis and their famous result . By their way of thinking, bumping the basis up once more to was the way to do better, but they left analyzing this as a problem. Others attempted the analysis and…found it gave worse not better results. So , actually , stood.

The insight for breaking through was to make a bigger jump in the basis. Vassilevska Williams was actually anticipated in this without her knowledge by Andrew Stothers, in his 2010 PhD thesis at the University of Edinburgh. Stothers used and showed this method capable of achieving , though there has been some doubt in whether all details were worked out. Vassilevska Williams, however, used and brought some powerful computing software to bear on a more-extensive framework for the plan. It is not clear whether there is anything necessary about jumping by a power of —in any event her program and framework work for any exponent.

The Proof

We cannot yet really give a good summary of the proof—further details are in her paper. One quick observation about her work is in order. She used the CW method, but extended it into a general schema can be used to find good matrix product algorithms, perhaps even better than the one in the paper. The algorithms themselves can be generated and examined, but as usual the task of analyzing them is very hard. The brilliant insight that she has is this task can be laid out automatically by a certain computer program. This allows her to do the analysis where previous others failed.

For example here is a sample of the overview of her main program, in pseudo-code:

The details are not as important as the fact that this program allows one to work on much larger schemas than anyone could previously.

What Does the Bound Mean?

Note that she has improved the bound of Stothers by . For what threshold value of does an additive improvement of in the exponent halve the running time? The answer is , which in this case is . This value is far above the Bekenstein Bound for the number of particles that could be fit into a volume the size of the observable universe without collapsing it into a black hole. In this sense the algorithm itself is even beyond galactic.

The meaning instead comes from this question: Is there a fundamental reason why could settle at a value strictly greater than ? Note that is not taken to mean the existence of a quadratic-time algorithm, but rather that for all there are algorithms that achieve time . There was some reason to think could be a natural barrier, but it was breached. Perhaps , since this is connected to the golden ratio? Her paper notes a recent draft by Noga Alon, Amir Shpilka, and Christopher Umans that speaks somewhat against the optimism shared by many that .

Open Problems

Can the current bounds be improved by more computer computations? Are we about to see the solution to this classic question? Or will it be struggling over increments of ?

In any event congratulate Virginia—and Andrew—for their brilliant work.

Update: Markus Bläser, who externally reviewed Stothers’ thesis, has contributed an important comment on the blog of Scott Aaronson here. It evaluates the significance of the work in-context, and also removes the doubt going back to 2010 that was expressed here.
History  News  Open_Problems  Proofs  breakthrough  galactic  matrix_exponent  matrix_product  Strassen  from google
november 2011 by josephzizys
A Brilliant Idea In Publishing
Big Results Can Come In Small Packages

Susan Friedlander is currently the Editor in Chief of the Bulletin of the American Mathematical Society, and also the Director of the Center for Applied Mathematical Sciences. She works on partial differential equations such as the Navier-Stokes equations. These equations are famously hard to understand, partially because of their connection to turbulence. Of course another measure of their difficulty is that they are the core of one of the remaining Clay Millennium Prize Problems.

Today I would like to talk about one of Friedlander’s brilliant ideas that has nothing to do with either the Navier-Stokes equations or turbulence.

In the latest issue of the Bulletin, October 2011, Friedlander devoted the entire issue to an interesting experiment. She had her editors select six articles from past generations that were seminal and helped stimulate research for decades, and then republished them. The were printed exactly as they appeared before—even with one or two simple typos that were there the first time. On page 215 there is “ otherwise”: which seems like a typo to me.

Five dated from the period 1977-1984, one was from earlier. Each editor wrote a brief introduction to their selected paper, but the stars were the original papers:

Poincaré Recurrence And Number Theory, by Harry Furstenberg.
Internal Set Theory: A New Approach To Nonstandard Analaysis, by Edward Nelson.
Invariants of Finite Groups And Their Applications To Combinatorics, by Richard Stanley.
On The Parallelizability Of The Spheres by Raoul Bott and John Milnor.
An Elementary Introduction To The Langlands Program, by Stephen Gelbart.
Lectures On Morse Theory, Old And New, by Raoul Bott.

I think this idea of republishing important papers is terrific, and I personally greatly enjoyed the issue, and urge you to take a look at it. It was wonderful to see these famous papers—we should try this experiment in theory of computing. Congratulations to Friedlander.

Poincaré

The first paper is beautiful discussion of the modern consequences of one of the great results of Henri Poincaré, who made fundamental contributions to almost all aspects of math and physics, and was called a polymath. What I found so interesting about the paper is that Furstenberg shows how Poincaré’s theorem helped launch an entire area of research, yet it has a tiny and simple proof.

This made me think about the size of proofs and the impacts of proofs. There is, at best, a weak connection between the length of a proof and its impact and importance in mathematics. There are long proofs that have had small impact on math; there are long proofs that have had a large impact; there are short proofs that have had a small impact—no surprise with these statements. What may be surprising is that there are proofs of statements that have had huge impact, yet the proofs can be small, even tiny.

Long Proofs With Huge Impact

Finding examples of proofs that are large, even huge, that have had large impact on mathematics is pretty easy. A moment’s thought yields many examples—you may have your own, and your own may be even better than mine.

Fermat’s Last Theorem: Andrew Wiles proved the famous theorem that there is no solution to where and are non-zero integers. The proof is clearly long, and has had major impact on the understanding of number theory.

Simple Group Classification: This is the work of many mathematicians, and classifies all finite simple groups into certain infinite families with twenty-six special exceptions. The classification proof is beyond large; it is estimated to be tens of thousands of pages long. The theorem is used in many places in mathematics to solve problems invoking finite groups and even beyond.

Gödel’s Relative Consistency Of The Axiom Of Choice: Kurt Gödel proved that the Axiom of Choice (AC) can be added to set theory without causing it to become inconsistent. Set theory, as in ZF, is already inconsistent or remains so after the addition of AC. This was immensely important since AC while very useful is also quite powerful and leads to many surprising results. For instance, AC shows:

There are sets that are not Lebesgue measurable.
There is a way to cut up a solid sphere into five pieces and reassembly them into a sphere that is twice the volume. This is called the Banach-Tarski paradox after the great mathematicians Stefan Banach and Alfred Tarski.

Short Proofs With Huge Impact

Euclid’s Theorem: Euclid proved almost 2300 years ago that there are an infinite number of primes. Recall the proof is to assume that there are only a finite number of primes: let be all the primes. Then look at

The number and so is divisible by some prime, let it be denoted by . Then by assumption for some . But this implies that divides , which is a contradiction.

Halting Is Undecidable: Alan Turing of course proved that there is no Turing Machine that can tell if an arbitrary Turing Machine will halt on a given input. The proof of great importance is a clever diagonalization argument. Yet it can almost be stated as an afterthought.

Recurrence Theorem: This is due to Poincaré, and I will explain it and the proof in the next section.

Recurrence Theorem

Let’s state the theorem precisely and then give the almost shockingly short proof. Let be an abstract space, and let be a measure on it. We put and assume that the class of measurable sets, i.e. for which is defined, is sufficiently extensive. Also let map to so that the measure is preserved,

for all measurable sets. This is all we need to state the theorem:

Theorem: For any measurable set with , there exists a point so that for some .

I quote Furstenberg:

The proof is extremely simple. Assume that no point returned to . Then for all , and so whenever . But the sets all have the same measure , and they cannot be disjoint since .

The importance of this theorem is many-fold. It helps shed light on physical systems, which is why Poincare studied it in the first place. It also has helped yield insights into combinatorics. For example, generalizations of it are shown to prove the famous theorem of Bartel Leendert van der Waerden:

Theorem: Divide the integers into subsets, and let be a natural number. Then one of the subsets contains an arithmetic progresssion of length .

Open Problems

What are your favorite examples of short proofs with large impact? The fact that they exist may even suggest some ideas for automated discovery of proofs. It plausible that while getting computers to find proofs that are huge may be impossible, getting them to help find very short ones might be possible. What about an automatic search for short, tiny, proofs with huge impact?

Also do you like Friedlander’s republish idea?

[Fixed broken link]
History  People  Proofs  AMS  diagonalize  Fermat  recurrence  from google
november 2011 by josephzizys
Who’s Afraid Of Arrow’s Paradox?
How to decide, when you really have to decide.

Kenneth Arrow was a co-recipient of the 1972 Nobel Memorial Prize in Economics—note this is a “Nobel” prize, but is a bit different from the original ones. It still is immensely prestigious, but it is a more recent creation. Arrow is famous for his impossibility result, which shows that there is no “good” voting system.

Today I want to talk about what should we do if we have to vote?

I often find myself on committees whose sole purpose is to make a multiple-choice decision, meaning to select choices from a larger set of choices. This happens for award committees, for membership committees, for program committees, and for various other committees. The key is that the committee is charged with selecting more than one choice from a large number of choices.

I also often find myself on committees whose sole purpose is unclear—perhaps there is no purpose. While I was at Princeton an administrator called a meeting that Hector-Garcia Molina and I had to attend. We ran into the administrator, a few days before his meeting, and after saying hello Hector asked, “what is the agenda of the upcoming meeting?” The answer was “two hours.” Unfortunately this was a short clear answer that was honest and correct.

The Voting Dilemma

Back to committees that do make decisions. The famous Arrow Theorem, see below if you are not familiar with it, shows that there is no voting system that really works. All systems can lead to “paradoxical” outcomes. But decisions need to be made: awards have to be given, members selected, papers accepted, and so on. The puzzle is:

What should we do in practice?

My experience on committees is very mixed. Some seem to operate in a rational manner, while others use strange ad-hoc techniques to make their decisions. The usual approach is to rank the choices—let’s call them candidates—and then select the top ones and drop the bottom ones. This initial selection can often be done with fairly good agreement—of course the ones on the border of that can still be hard. Then the committee has the difficult job of selecting from the remaining candidates. They are usually very close, and it can be extremely hard to make the final choices.

This final step of deciding on the “middle” candidates is where the strange techniques appear. My experience is that the chair usually has a pet or in-house tie-breaking scheme. It may be a voting method based on points, for instance your first choice gets 1 point, the second 2 points, and so on, with the lowest ones being selected. Sometimes voting methods are created on the fly—a member of the committee will say, “why don’t we ”

I know that there is no “best” method, and all methods can be “gamed,” but I still wonder what we should do? Are there in some sense methods that are better than others?

Arrow’s Theorem

Here are three “fairness” criteria that would seem to be axiomatic in any voting system:

Unanimity: If every member prefers candidate over candidate , then the committee prefers over .
Non-dictatorship: No single member possesses the power to determine the commitee’s preference in all cases.
Focus: If every member’s preference between and remains unchanged, then the committee’s preference between and will also remain unchanged, regardless of any changes in preference between pairs or or with .

The first two seem especially basic. If there were a dictator, then perhaps we could just skip the meeting. Perhaps no, since the meeting still has an agenda, even if the agenda is just “two hours.” The third is more commonly called the “Independence of Irrelevant Alternatives” (IIA) axiom. It sounds reasonable because surely the group preference for versus should depend only on the merits of what each individual thinks between and , no? No:

Theorem: When there are at least two committee members and at least three candidates, there does not exist a system of voting that meets the above three axioms.

Formally, let be the set of permutations of candidates, interpreted as rankings, and let be the number of committee members. Then a voting system is a map from to . Let be the subset of in which is ranked higher than and its complement. For any , define

where if and if . Then Arrow’s Theorem states that if always maps into and does not depend on any single argument, then there must exist such that the range of restricted to is not wholly contained in either or .

There is a related theorem by Allan Gibbard and Mark Satterthwaite that extends to systems that work by members giving points rather than preference rankings. Even if the goal is to select just a single winner, the theorem says that every system that always selects when every committee member gives highest points, and that isn’t dictated by a single committee member, is susceptible to being “gamed” by a member seeking an outcome different from the ranking given by the member’s points.

The Aggravation of Aggregation

The root of all the difficulty is well known. To avoid tiebreak issues we regard not as the minimal number of committee members, and consider candidates . The problem is what to do if the members’ permutations are , , and . Then a majority prefer , a majority prefer , and a majority prefer . Who should win?

Related issues involve parliamentary systems that allocate seats to parties based on proportions of the total vote. If you follow the quota rule of giving each party either the integer floor or the integer ceiling of its share of the vote times the total number of seats, then non-monotone situations can arise where a few extra votes or an increase in can cause a party to lose a seat. These situations also affect the number of Congressmen given to states after each U.S. Census.

In committee situations, of course, one usually weeds out the lowest choices altogether. Then the problem becomes, how to re-evaluate the rankings or points given by the members who preferred them? This leads to the issue of runoff methods for elections with more than two candidates, which are also fraught with paradoxes.

For one example, in the women’s figure skating final group at the 2002 Winter Olympics in Salt Lake City, Sarah Hughes stormed from 4th place to win ahead of co-favorite Michelle Kwan, who was pushed to 3rd behind Irina Slutskaya. This article explains that had Slutskaya suffered one more fall, Kwan would have won the gold, despite the feeling based on IIA that how badly a third skater performed should not disturb the focus between Kwan and Hughes. (This had nothing to do with the ice-dancing judging scandal from the same Olympics, but did play into the creation of the new ISU Points System in 2004.)

A term more general than voting for the problem is aggregation of multiple preferences into one rank order. Every time one searches on Google, one sees the rank result of an aggregate score based on a “commmittee” of page-rating criteria. Google has evidently chosen an aggregation method that suits its own criteria, at least most of the time. That leads to our query: can we identify situations for committees in which there does exist a procedure that is (nearly) always satisfactory?

Voting Methods

There are a vast array of voting methods that committees can use. All can have paradoxes, but all have some properties that make them interesting. Many are used in real situations, but the fact that the list of methods is so long suggest that this is far from a solved problem. Here are some:

Highest averages
D’Hondt method
Sainte-Laguë method
Largest remainder
Hare quota
Droop quota
Wright system
Parallel voting

This problem has been studied through the centuries, and even Lewis Carroll contributed theory to it. What we ask is whether there has been a good mapping of methods to categories of applications where they are most appropriate. We note for instance this paper by Cynthia Dwork, Ravi Kumar, Moni Naor, and Ken’s former student D. Sivakumar while he was at Google. This proposes using Markov Chains to combine the initially-given , followed by a transformation they call “local Kemenization,” and gives an application for identifying spam.

Open Problems

How should we decide when we are on a committee and are faced with a real decision?
History  People  Proofs  Algorithms  paradox  Problems  voting  from google
november 2011 by josephzizys
Poker and Cantor’s Proof
The reals are still uncountable

Poker Source

Dave Patterson is a world-famous expert on computer architecture. He has won many awards, and is a member of the National Academy of Engineering and the National Academy of Sciences. He coined the term “RISC” and later the term “RAID.” He also wrote definitive textbooks on computer architecture, with John Hennessy, including the famous Computer Architecture: A Quantitative Approach..

Today I and Ken want to continue the last discussion on Georg Cantor’s Diagonal Proof. My goal is to try to answer one of the issues raised about this famous proof. I do thank all the commenters for an interesting discussion.

What does this have to do with Dave? He is not a theorist, and may know Cantor’s proof—but certainly is not excited about it. He is very practical-oriented: his most famous work is on the design of Reduced Instruction Set Computing processors.

But Dave is a poker player. When I arrived at Berkeley in the late 1970′s, I wanted to bring one tradition from Yale to Berkeley: playing a “friendly” game of cards on a regular basis. I assumed that my theory colleagues, Dick Karp and others, would be interested in playing. The game I had in mine was low stakes poker. The rules were “dealer’s choice” and modified “pot limit.” The latter means one could make a bet that was bounded by half the current pot. This method of betting was introduced to me by Albert Meyer, and it can lead to exponential increases in bets. This increase makes the game exciting—I felt—and allows for bluffing. Even through the stakes are low, the game can be quite exciting.

In any event I was sure that my theory friends at Berkeley would jump at a chance to put probability theory to a practical test. I was wrong. Every member of the theory group said: not interested, sorry. The sole possible exception was Karp, but poker requires players for much greater than to be fun. I was about to give up when I mentioned it one day to Patterson. Before I knew it all the computer architecture group—well most—had signed up, and a new course was created:

CS 52

When you got the message, “CS 52 this week at X’s house,” you knew the game was on. We played for the two years that I was at Berkeley, and had a great deal of fun. I believe that they still play poker there, although the game has changed to Texas Hold ‘Em and the betting is different. I believe that it is one of my great and lasting contributions to Berkeley Computer Science Department. Perhaps they will put up a plaque that says…

Show Your Hand

In almost any version of Poker—Texas Hold ‘Em, Stud, Draw, or any other variant—there comes a time when all the bets and raises and re-raises are done. At this point, if you are the person who last bet or raised, you must show your hand, that is, must lay your cards on the table. That’s part of the meaning of saying that you have been called. Then the other players can either throw their hands away or show their cards and prove that they have a better hand. The winner takes the money.

If you are called you must show your cards. You are not allowed, if called, to say I will show my cards later; or go through the deck and change you hand. The game has come to the showdown—you must show what you have. Not anything else. Any attempt to not do this is considered cheating—do not try this in a real game.

The same principle of showing your hand applies to mathematical proofs. For instance, if you are a Count and thus believe that the reals are countable, then you must be prepared to give a listing of the reals—essentially show your cards. This is where most Counts have trouble: they want to be able to change their “hands.” Tim Gowers said it best when he wrote that he usually asks a Count to present the actual listing. Essentially Tim is calling them, and demanding that they show their hand.

Note that this requirement is standard math, and has nothing at all to do with countable sets, uncountable sets, infinities, or the reals. If you are claiming that is rational, you must be prepared to agree that there is some fixed and natural numbers, so that

You are not allowed to change the values of and during the proof. Either there is a fixed or there is not.

Another example is Euclid’s classic proof that there are an infinite number of primes. If you believed that there is a largest prime, then you must be prepared to say that is the largest prime. Then, Euclid can use this number to force you to consider

The argument then shows that is either prime or is divisible by a prime , where must be larger than .

An even simpler version of this is the “largest number game.” You name a number and then I have to name a larger one. Again if you think that there is a largest one, you must be prepared to say that the largest one is , say. Then I point out that is larger. Scott Aaronson retold here an old story based on this game:

Two noblemen vie to name the bigger number. The first, after ruminating for hours, triumphantly announces “Eighty-three!” The second, mightily impressed, replies “You win.”

Back To The Proof

If you believe the reals are countable, then you must show your hand. You must fix a listing of the reals:

This is nothing different from picking a pair to show that is rational, that there is a largest prime , or even that there is a largest number . This is the show your hand step. If you disagree with this step in Cantor’s proof, then you must also do the same in the above simple examples. You also must not play Poker by the standard rules.

Once I have seen your hand, the sequence , I can try and find a real that you missed. There are many ways that I can do this. Since I can always construct a real number based on your sequence, your hand, that is not any , I win. Your sequence does not contain all the reals.

Leading Out Versus Check-Raising

Ken on the other hand recalls he did not play a single hand of poker during 5-1/2 years at Oxford, and has picked up poker recently only because of the interest of his son. But he finds that his own angle on Cantor’s theorem, informed by his exposure to finitists at Oxford, fits in well with the poker analogy.

In poker a player whose turn to bet comes first is “under the gun” and often at a disadvantage. Assertively “leading out” with a sizable bet gives opponents more leverage to raise since more of the player’s own money is in the pot. Instead the player is allowed to check, i.e. to bet zero, and subsequent players may check until a bet is made. If an opponent bets, then the first player is in the situation of being able to raise, not just call. A raise then is called a check-raise.

The math analogy is that asserting the existence of a set that is in some sense more complex than any set previously admitted is a form of leading out. If we’ve only reached a comfort zone with countable sets, then an uncountable set represents a new level of complexity. Indeed, to a finitist, every power-set is a big step up in complexity. If you think a googolplex is a meaningless quantity, then you can only iterate for a few steps before you would pass it.

Leading out with an object on a new level of complexity takes chutzpah. But even a humble finitist may be comfortable with theorems of the form, “If you have an object then I can give you a .” This is like saying “I check”—let someone else bet . If has the same intuitive complexity level as then that is like a call. If is a step up in complexity from , but similar to how was a step up from whatever was “in the pot” already, then is a check-raise.

The “Humble” Cantor Theorem

Here is a form of Cantor’s Theorem that embodies the check-call or check-raise idea, depending on how it is applied:

Theorem 1 Given any mapping from a set into its power set , we can find a subset of that is not in the range of .

Proof: Define . We claim that is not in the range of . If it were, then there would exist a such that . But then we have:



is in the set

by definition of



is not in the set

by definition of .

Since a logical statement can never be equivalent to its negation, this contradicts being in the range of .

Note that the proof itself embodies Cantor’s diagonal argument, but makes no reference to infinite constructions or possibilities, indeed no reference to infinity at all. It is not asserting anything on its own, but reacting to someone else giving an . Since has roughly the same complexity as —both involve the power set—producing is like a call.

What this has done is bring the implied onus of the bettor to show one’s hand into the theorem statement itself. It puts the question to the Count right away. For simplicity, let’s first talk about the power set of being uncountable:

If you think that is countable, then you are giving me a function from to that is onto . You can’t give me such a function, because I can give you and it will show that you were bluffing.

With a little technical work a Count should be convinced that a bluff on is equivalent to a bluff on . Thus the Count must admit that one can never have the cards to win a showdown that the reals are countable, and by “never” itself, should come on board that they are uncountable.

Raising Higher

One nice point about this formulation is that it carries over verbatim to prove the existence of non-r.e. sets. Just take to be the mapping from a Gödel number to the language accepted by the Turing machine , and what tumbles out is the creation of a diagonal language . Since is more complex than any in the sense of being non-r.e., we can call that a check-raise.

How widely can one replace theorems of the form “There exists a ” with “humbler” ones of the form “If you give me an , then I[…]
All_Posts  History  Ideas  Oldies  People  Proofs  from google
october 2011 by josephzizys
Quantum Chocolate Boxes
What kind of knowledge helps us peek inside black boxes?

David Deutsch is one of the fathers of quantum computation alongside Richard Feynman. Deutsch is a Fellow of the Royal Society of London, and is attached to the Centre for Quantum Computation at Oxford University.

Today we want to step back from quantum physics and focus on some quantum algorithms as situations involving ordinary queries, such as in the game “Twenty Questions.”

Deutsch formulated the quantum Turing machine (QTM) model in 1985, seven years after completing his doctorate in mathematical physics at Oxford. Ken was an onlooker as professors there turned aside Deutsch’s initial thoughts that a QTM could violate the Church-Turing Thesis, and Ken queried what all the fuss was about. Of course Peter Shor’s evidence for violating the polynomial-time Church-Turing thesis has decisively answered that query.

Besides developing the QTM, Deutsch devised the first algorithmic situations involving simple Boolean logic that illustrated extra powers of quantum computation. These were extended by Richard Jozsa and Daniel Simon and others, before Shor’s breakthrough on factoring. Ken and I are trying to understand these algorithms in a new way.

Like physicist Brian Greene whom we’ve recently cited, Deutsch is an author of a popular science book with title beginning The Fabric of… Deutsch’s book is The Fabric of Reality. Greene’s is The Fabric of the Cosmos, but Reality appears in its subtitle. The books have little in common—the closest overlap in Ken’s opinion is that both support the Many-Worlds Interpretation, which yields the ensemble of worlds on which Deutsch’s mathematical analysis is predicated. The math involves the statistics of how much information is needed to distinguish between possible worlds.

Queries and Chocolate Boxes

The famous tag line from the movie “Forrest Gump” is

Mama always said life was like a box of chocolates. You never know what you’re going to get.

Mama isn’t letting you read the index to the chocolates on the inside lid, and may not even let you look inside to see their shapes. However, if you can ask queries about what’s in the box and get useful answers, you can at least narrow your range of expectation.

Indeed, suppose Mama has two boxes.

Can you find a query that will enable you to decide between them? If you know in advance that one has pralines and the other does not, the query “Are you a praline?” will separate the boxes—chocolates in one will all answer “yes” and the other box, “no.” If you are fortunate enough to have access to a myriad of boxes, finding the right queries to narrow them down is itself a challenge.

Some underlying principles are familiar from the game “Twenty Questions.” If you have possibilities, then it is axiomatic that for every strategy that asks at most yes/no queries, there is at least one input that the algorithm will fail to distinguish. This follows from information theory, because the answers to the queries provide no more than bits of information.

A similar situation happens with matrices , when queries are made by giving argument vectors and requesting the answer . If you are trying to identify an unknown matrix , then you will generally need at least such queries. Each query gives you at most one dimension. With queries you can always succeed by presenting the standard basis vectors with the in the -th place, as the values give you all the entries of .

The question is: When can we do better? Deutsch’s insight was to isolate situations where quantum computations can obtain information with speed that appears surprising when the situation is approached with classical reasoning. The issue of speed is highlighted by Shor’s theorem, but we also wish to understand it without the “quantum magic” as an issue of how to frame queries. Note that by Holevo’s Theorem, quantum computers cannot actually cheat the above information-theoretic principles. What is the simplest explanation of what is going on?

The Problem

The original Deutsch algorithm operates on a one-bit boolean function:

The goal is to tell whether the function is a constant or not, by asking only one query for a value of . There are one-bit Boolean functions, and two boxes we are trying to separate:

One box, Box C for “Constant,” has the constant functions and defined by:

The other box, Box B for “Balanced,” has the identity function and the NOT function :

Let us think of the value as saying whether is a light or dark chocolate, and the value as saying praline or fruit filling. Thus we have:

Box C has a light praline and a dark fruit-filled chocolate.
Box B has a dark praline and a light fruit-filled chocolate.

If we query the value , we are asking, are you light or dark? Both boxes have a chocolate that can answer “light” and one that can answer “dark,” so this query does not separate them. The same lack faces . It seems that to learn which box is in, we need to ask two questions. How can we do it with one?

Linear Extensions

We can apply an idea that has been important in computational complexity: (multi-)linear extensions of Boolean functions over various fields. Here the extensions are clear:

Now ask for the value ? All four chocolates give different answers, so given one answer by an unknown chocolate, you can tell which it is. In particular, the answers or tell you Box C, while or tells you Box B. If we only cared about distinguishing the boxes, we could query instead.

In a sense what we have done is asked a linear combination of the two possible queries, here . By linearity the result is the same linear combination of the answers of any individual function , so .

For a digression, you might wonder if you can still separate the two boxes when the chocolates in each box are allowed to give a fixed linear combination of their answers. For instance, suppose the chocolates in Box C and Box B elect to do the following linear combinations, which happen to have the same coefficients:

Then and are both the constant function with value for any . That is, mushing together a light praline and a dark fruit-filled gives the same result as a dark praline together with a light fruit-filled. No possible query can separate these two cases.

Does Deutsch’s Algorithm Do More?

Now Deutsch’s quantum algorithm uses a different linear extension—let’s see how different it is. The Boolean values and are represented by qubit values and . These in turn are formally two-dimensional standard-basis vectors

over the complex field . Now the relevant linear extensions of the four Boolean functions are the following matrices:

The analogue of the argument which worked last time is . We have the same distinctions as before:

In quantum, however, we have one more requirement—the computation must be reversible. This is violated above because the matrices and for the constant functions are not invertible. The standard remedy is to replace the original function by the two-dimensional function . Here on Boolean values is XOR (exclusive-or), while on arbitrary complex values we can put .

Thus Deutsch’s algorithm is technically about distinguishing the resulting functions from . We get these matrices:

Strangely, under this representation not has become the identity matrix, and all four are now permutation matrices. What does is interchange the first two co-ordinates, the latter two, and swaps both. Thus we can distinguish the four matrices by their values on a query like by seeing where the minus signs end up.

The quantum algorithm needs the query argument to be a unit vector, so is used. The state is easy to prepare by a few Hadamard transforms. To distinguish whether an unknown answer comes from in Box C or in Box B, we need only tell whether the minus signs in are in positions of the same or different parity. This can be done by a Hadamard transform on (part of) followed by a measurement. The details here are essentially similar.

Is there really more here than the linear extension idea?

The Deutsch-Jozsa Extension

Here the chocolates are multi-variable Boolean functions . Box C still has just the constant functions and , but Box B has every function on that is balanced, meaning that of the values are , and are .

If one is limited to queries for Boolean arguments then queries might still not be enough to tell Box C from Box B. The queries might all be answered , and yet could still be balanced. This was the point that Richard Jozsa added to Deutsch’s original observation, while one quantum query still suffices.

We can, however, simply see this in linear extensions. Give the argument , normalizing by dividing by or or whatever factor is desired. By linearity, the extension of every balanced Boolean function will give the same value strictly between and . This separates the boxes with one query.

The quantum version of this, however, involves vectors of -many qubits. The above encoding transforms them into ordinary vectors of exponentially many bits, , or with the reversibility trick. This puts extra heft into our main query: is exponential-sized notation the best explanation of what Nature is doing with a polynomial amount of effort?

Later we will illustrate with some other quantum algorithms, such as Simon’s algorithm, in which the goal is to separate the boxes of functions

over all , where is bitwise exclusive-or. When we have the box of all 1-to-1 functions, while the other boxes have functions that are 2-to-1 in a particular periodic manner. When is unknown this may require something like the Hadamard transform where even with our classical standpoint. Of course Simon’s algorithm was an inspiration for Shor, so this may isolate a critical difference.

Open Problems[…]
History  People  Proofs  boolean  chocolate  crypto-systems  problem  quantum  from google
october 2011 by josephzizys
What If Cantor’s Proof Is Wrong?
Believing impossible things

The Red Queen—I do not know her full name—is one of the most colorful characters in Lewis Carroll’s Through the Looking-Glass. She is known for her many wild statements and strange conversations—here is one with Alice:

“I can’t believe THAT!” said Alice.

“Can’t you?” said the Queen in a pitying tone. “Try again: draw a long breath, and shut your eyes.”

Alice laughed. “There’s no use trying,” she said, “one can’t believe impossible things.”

“I daresay you haven’t had much practice,” said the Queen. “When I was your age, I always did it for half-an-hour a day. Why sometimes I believed as many as six impossible things before breakfast!”

Today I want to talk again about Georg Cantor’s famous proof that the reals are uncountable.

Cantor’s proof reminds me of the above conversation: sometimes I feel like Alice and sometimes like the Red Queen. For many Cantor’s proof is impossible to believe, while for others it is impossible not to believe. Cantor’s proof is correct, but if you have trouble understanding it, or know it is wrong, please keep reading. Please. I would like to engage you.

Actually, there is a fundamental reason to doubt it. Statements asserting the uncountability of certain sets can be formalized in the first-order logic of set theory. But every consistent first-order logic has a model with a countable universe—thus one within which no uncountable sets actually exist. In such a model, everything we think we are saying about the real numbers is translated into an equally meaningful assertion about a set that is actually countable. This is ultimately because in a logical language we can say only countably many things before breakfast, and in a first-order language we can talk about only one thing at a time. We cannot actually say—or believe—uncountably many things before breakfast.

Approaching Cantor’s Proof

Cantor’s proof is strange. It is short, brilliant, important, and yet not believed by some. I find this intriguing: what makes this proof so disputed? What makes it the most discussed proof on the web? In short why is it so hard to believe? See here, here, here, or here for some examples.

But I will say that it is one great results of all time, in my opinion. So I am one of those who do believe in the proof, yet I think we can still discuss this belief, so please keep reading.

The result is of great importance, has a wonderful proof, and has led to many further insights. For instance, the famous result of Alan Turing on the undecidability of the Halting Problem, would be impossible without Cantor’s diagonal method. The same with Kurt Gödel’s Incompleteness Theorems.

Yet Cantor’s proof is special. For such a short proof it certainly has generated a huge amount of controversy. Many people do not believe it, call it nonsense, or worse. Some call people in this category nasty names—I do not think this is right, and I would never do that. I think rather there is something deep going on here. I wonder if the reason so many do not get the proof, is a symptom that we do not explain the proof well. I have tried several times before to discuss this result—see here and here and here.

In the remaining discussion let’s call those who believe the proof Uncounts and those that do not Counts. The names are selected to reflect what they believe: Uncounts believe that the reals are uncountable and Counts that the reals are countable. I am an Uncount, but being a Count is pretty good.

I am always very sensitive to those who misunderstand something: if this happens in one of my classes, I almost always assume it is my fault, not theirs. I once carried this perhaps too far, while teaching a class on complexity theory. We were about halfway through the semester, when a student asked a question that was essentially: what is a Turing Machine? I immediately said: oh, good question—my usual first words. Then as I started to answer the clearly very confused student, the rest of the class yelled:

Are you kidding?

They were mad that a student could ask such a question after hours of lectures on Turing Machines.

The Proof

I will not repeat the famous diagonal proof. See here for a nice explanation. I assume both Counts and Uncounts know the proof. Of course the former do not believe it and the latter do.

Beyond The Proof

As an experiment let us assume that the reals are countable—that is, that Cantor is wrong. That there is a function

that is onto; recall this means that the range of the function is all of . Thus there is a list

that includes every real number at least once. The list includes all the rationals, the number , the number , and every real number you can think of and more. The list, the range of , contains all real numbers. This seems like a pretty cool situation. In this world where the reals are countable, let’s call it the Non-Cantor-World (NCW), what happens?

What I am curious about is what would it allow us to do? Would it really matter, or would it not change our understanding of real numbers, of analysis, of mathematics? Perhaps this is one of the reasons that people do not think that the proof is correct?

As stated above, we know there is an NCW if first-order set theory is consistent. If. We have recently covered the even simpler doubt about Peano Arithmetic being consistent. If we can overcome that doubt, what is easy to say about NCW’s?

Lebesgue Measure

Henri Lebesgue defined his famous measure on sets of real numbers, which is quite constructive, see here for a quick overview. Technically this measure is a map from certain sets of real numbers to non-negative real numbers . The measure has the following critical property:

for any disjoint measurable sets of real numbers. This is the property of countable additivity. The construction of Lebesgue measure starts by defining the measure of an interval to be , what else. Then it uses a clever extension method to define this for many more sets of real numbers. It is easy to show that this implies the inequality

where the sets need not be disjoint, but must be measurable.

Now this is a problem for Counts, because this leads to a contradiction. The key is that for any single point : this follows by the definition for intervals, . Recall

are all the real numbers. This shows that

But this is impossible since is easily seen to be . If you wish to avoid infinities, then just pick the subsequence of reals that lie in , call it . Then again

but .

This seems to be a spot of trouble for Counts. If you claim that the reals are countable, then you must reject Lebesgue measure. But there is no diagonal proof inside the construction of the measure. This seems to be a problem—what does measure theory look like in an NCW?

Baire Category Theorem

A basic result about the structure of the reals is called the Baire Category Theorem. It is named after René-Louis Baire who proved it in his 1899 Ph.D. thesis.

A set of reals is called first category if it is a countable union of nowhere dense sets; otherwise it is of the second category. Thus if the reals are countable, then they form a set of first category. This follows since the sets are nowhere dense, and

But, Baire’s theorem is:

Theorem: The set of all the reals is of second category.

This follows since the reals form a complete metric space.

Again this seems to be a spot of trouble. If you are a Count, then this theorem must be false. Or it must be false that the reals are a metric space, or that the reals are complete. But the reals have a simple metric, the distance from to is just . And by definition the reals are a complete space, that is what makes the reals useful. Any convergent sequence converges to a real number. Note, the rationals do not form a complete metric space.

Another Consequence

The Counts have one very neat positive result that cannot be proved in basic set theory. They can prove that the real numbers are well ordered. A set is well ordered by an ordering provided every non-empty subset has a least element. The natural numbers are well ordered by . Any non-empty set of natural numbers has a least element.

In the NCW the Counts can prove this immediately. Let and be two real numbers, then there are unique smallest and so that and . Define provided . This is easily seen to be a well ordering for the reals.

Thus, the real numbers have a well ordering: it is just the above . This usually is viewed with some surprise. The real number line seems to have the property that it should not have such an ordering. All proofs of this must use some powerful axioms beyond basic set theory. This is another possible spot of trouble for the Counts: How can it be so easy to prove that the reals are well ordered?

Open Problems

Does this help? Have I shed any new light on the issues? I doubt that any Count is now converted to an Uncount, but I certainly hope that I have not converted any Uncount to a Count.
People  Proofs  believe  Cantor  countable  uncountable  from google
october 2011 by josephzizys
An Annoying Open Problem
How do we tell if two groups are isomorphic?

Vikraman Arvind and Jacobo Torán are both complexity theorists, both with wide interests, but both who have worked extensively on the graph isomorphism problem. See their survey here, for example.

Today I want to talk about about an open problem, and their paper on the problem, which makes some progress. But the problem is still open—I find this an embarrassment—we should be able to solve this problem.

The problem is: Given two groups defined by tables, are they isomorphic? This is the - problem. What is annoyingly open is whether this problem belongs to , or alternatively, whether it is equivalent to Graph Isomorphism.

A Cayley table, after the 19th century British mathematician Arthur Cayley, is just the table of the multiplication function

There are many ways to represent a finite group. The above is natural: just give the multiplication table. There are many others: we could give the group as a “black box,” as a set of generators and relationships, or as a set of matrices. It should be clear that the easiest representation to deal with must be Cayley tables. Many questions that are hard in other representations are easy, even simple, in this representation. Note that this representation is larger than the size of the group itself, basically its square. The other representations, by contrast, are more succinct.

The Problem

The precise definition of the - problem is: given two tables that define the groups and , are they isomorphic? That is, is there a bijective mapping from the elements of to the elements of , say , so that for all in :

This is a fancy way of saying that except for the names of the elements the groups are the same.

The Short History

I find this problem very annoying. I have spent uncountable hours working on this over the last decades, especially jointly with Zeke Zalcstein. One reason the problem is so annoying is that it is easy to prove that it can be reduced to Graph Isomorphism (). But surely the additional structure of groups must help make the problem much easier to solve. Here are some encouraging differences between groups and graphs:

Every subset of elements of a graph is a graph; only certain subsets of elements of a group form a group.
Groups have a tight structure that varies with the prime factorization of , the number of elements. For instance, if is a prime there is exactly one group—the cyclic group .
Groups always have automorphisms, provided . Graphs can have many automorphisms—for instance the complete graph has —or they can have as little as no nontrivial automorphisms.
I could go on and on with the many structural differences.

Yet for all these differences, the best Zeke and I could do was prove, in 1976, that - can be solved in space . Bob Tarjan independently noticed at the same time that it could be done in time. Both results are really the same: they both exploit the fact that a group of elements always has a generating set of size . This is a direct consequence of Lagrange’s Theorem, as follows:

Let be a group with elements. Let the trivial group that contains only the identity element of . If is all of , then stop and we have found generators. If is not add any element of that is not in . By Lagrange’s Theorem this must have at least twice as many elements as : call it . The last point is key: let have elements and have elements. Clearly . But , which implies that .

Complexity Approach

One of the success stories of complexity theory, and potentially a great story, is its application to solve the - question—almost. The well-known Merlin-Arthur protocol for graph isomorphism works also for group isomorphism—see here and here for details.

What Arvind and Torán prove is that at least for a large class of groups, namely the solvable groups, the - problem is almost in . Namely, there is an -machine for the complement of this language that works correctly on all but a quasi-polynomial number of inputs of length . This may not sound very impressive, but it is. By the famous Feit-Thompson theorem, all groups of odd order are solvable. Note that solvable groups play an important role in others ways in complexity theory, which I have discussed a number of times before.

What they actually first show is that the complementary task, --, has an Arthur-Merlin protocol in which Arthur uses only random bits and Merlin uses only nondeterminism. This makes various kinds of de-randomization easier to apply. The existence of a language in that is bi-immune to polynomial space, i.e. the assumption , suffices to de-randomize this completely and thus put - into .

The Group Approach

Recall that groups come in various types: there are simpsle groups, there are solvable groups, there are abelian groups, and more. One type of groups that are very important are called -groups. A group is a -group if the order of the group—the number of elements in the group—is where is a prime and . Every such group is solvable.

One of the outstanding issues in group theory is that there is no general structure theorem for -groups. They are very complex, and even understanding the structure of all such groups of size for modest values of is non-trivial.

The groups of order for were classified early in the history of group theory, and modern work has extended these classifications to groups whose order divides . According to our friends at Wikepedia–see here:

the sheer number of families of such groups grows so quickly that further classifications along these lines are judged difficult for the human mind to comprehend.

The number of different, non-isomorphic, -groups is exponential:

As a complexity theorist I would disagree a bit with the idea that many implies hard to understand. There are many boolean strings of length , but they are not that hard to understand in some sense. What makes -groups hard, is that they can have complex structure, and we do not understand it, at all.

Zeke and I worked hard trying to understand these groups enough so that we could get a better isomorphism algorithm. We never made any progress. I would be excited if we could even prove there is an algorithm that runs in

for -groups.

Open Problems

Please solve the - problem. Or at least break below the time, which is the best known now for decades. Can you prove

for some ? Good hunting.
P=NP  People  Proofs  Algorithms  AM  coNP  groups  isomorphism  NP  Problems  solvable  from google
october 2011 by josephzizys
What If Peano Is Inconsistent?
What can we learn from near-death experiences of theories?

Ed Nelson is a senior professor in mathematics at Princeton. Years ago he claimed, for a while, a proof that . Now he claims, as reported by John Baez, that Peano Arithmetic (PA) is inconsistent—see this.

Today I want to just give a pointer to John’s post on this topic.

As he says:

It’s a long shot, but I can’t resist saying a bit about it.

I will leave the discussion of the proof and whether it is correct to The -Category Café. Please go there for the thoughtful discussion of the main ideas, and great comments; or add your thoughts to the discussion there. It has raised, I believe, two issues that are independent—bad pun—of whether Nelson is right or not.

This is not the first time that a researcher has claimed that PA is inconsistent, nor do I expect it to be the last. But it is important, I believe, that people think about these hard questions. And I am happy to see a serious attempt, whether it succeeds or not.

Downward Incompleteness

I once was at a talk on a “proof” that even more than PA was inconsistent, even seemingly weaker theories. One question that occurred to me, which I decided not to ask the speaker out of politeness, was:

What theory do you use to prove that your theories are inconsistent?

It seemed like he was using PA to prove that these sub-theories were inconsistent. This did not seem right to me.

Suppose that is a mathematical theory. We know from Kurt Gödel’s famous Second Incompleteness Theorem that if is powerful enough, then the consistency of cannot be proved by . Note, even this simple statement must be treated carefully, since Gödel’s theorems depends on defining consistency in a reasonable manner. See this famous paper by Solomon Feferman, and more recently this MathOverflow item for a discussion of this delicate issue.

Now let be another theory. If is contained in , then of course cannot prove the consistency of . However, suppose that is contained in —it may be a weaker theory. Then consider this possibility:

Theorem 1
The theory can prove the inconsistency of , but that proof is uninteresting.

The point is if is inconsistent, then it proves all statements, true or false ones. Thus, this proof says nothing about .

The point is that even if Nelson could show in PA that PA is inconsistent, I do not think that he can extend that to prove a weaker theory is also inconsistent. At least not in meaningful way.

Truth vs Proof

Suppose that Nelson is right. What does that say about the zillions of theorems that have been proved in PA? What will happen? Is it the end of mathematics?

My one point is that we need to make a distinction between a theorem being true and a theorem being proved. They are not the same.

Take Fermat’s Little Theorem: one of the basic results of number theory. Whether PA is correct or not, I believe this theorem is true.

There are several reasons why I believe that. Perhaps the strongest is that everyday millions of uses of this theorem happen, since it is used in cryptography by RSA, for example. If there were problems with the theorem, it seems plausible that the theorem would have failed by now, and that we would know about it. But it seems to be fine.

Apparent Flaw

Terence Tao, as he has often done in the past and with the incisiveness of his wonderful blog, has articulated an evident conceptual flaw, here in comments in John Baez’ post. Nelson’s argument involves iterating through different sub-theories of PA, each purported to have Kolmogorov complexity bounded by . The problem is that there are only binary strings of length up to that could possibly serve as Kolmogorov descriptions of these theories. Hence even though all the theories are conceptually related and collectively have a description much smaller than , one individual theory by itself must have Kolmogorov complexity that is greater than . This is evidently enough to spoil the delicate reasoning in Nelson’s argument.

Open Problems

Is PA consistent? If it is not, how do we go and see which “theorems” are true and which are not?

If Nelson is right—or if his proof idea can be made to work for some other strong theory—this could be a huge undertaking that will keep us all busy for years.

If Nelson’s argument is un-fixably wrong, does it at least help us learn about new and interesting “paradoxes”? What does this tell us about the fine balance of proof and truth?
People  Proofs  arithmetic  inconsistent  Peano  proved  true  from google
october 2011 by josephzizys
Claiming Picard’s Math May Have Gaps
The Picard Jump: if you are not at an extreme, then you are far from an extreme

Charles Picard was a French mathematician who made great contributions to analysis. His best known results, perhaps, are his theorems that restrict the possible ranges of analytic functions. He also made important contributions to applied mathematics, including telegraphy—the “Internet” of the late 1800′s.

Today I want to talk about one of his great theorems, give a counterpart that holds for finite fields, and in general discuss this type of result.

The search engine Google is pretty good, but it is not perfect. When I searched for some information about Picard I tried “Picard math,” thinking this would be enough to get me to the mathematician Picard. It did. Yet right behind the entry for a Wiki page on his life, was an entry on a New York Post article:

That’s the message the Securities Investor Protection Corp. had for some victims of convicted Ponzi schemer Bernie Madoff who think they’re being short-changed by the trustee of Madoff’s estate.

As part of a larger brief filed with the US Bankruptcy Court in connection with the suit against Irving Picard, SIPC noted that under the calculations that plaintiffs Maureen Ebel and Roger and Diane Peskin want to use, they’d get less than they would using Picard’s math.

Indeed, a New York Times story from the same time shows up on the next page of hits: Picard Opponents Claim Math May Be Faulty. I wondered at first if they could really mean Charles Picard and be referencing a French existential math controversy I didn’t know about. Picard was interested in many things, but I did not see how his main results could help decide how to “fairly” divide the money among those cheated by Madoff.

Picard’s Theorems

The Little Picard Theorem is a pretty powerful theorem, and was first proved in 1879. We all should be so lucky to discover a little theorem like this. One of the most important theorems in elementary number theory is “Fermat’s Little Theorem.” Without the latter theorem there would be no modern cryptography—especially the famous RSA public-key cryptosystem would be missing. Sometimes “little” should be translated into “basic and extremely useful.”

The Little Picard Theorem is:

Theorem: If is an entire non-constant function, then its range includes all complex numbers, with at most one exception.

The theorem is as strong as possible: consider the simple entire function . This takes on all possible values, except .

This is a huge improvement over a previous theorem of Joseph Liouville, which states that every entire non-constant function must be unbounded. Even Liouville’s theorem can be used to give a short proof of the Fundamental Theorem of Algebra: that every non-constant polynomial has a root. The idea is suppose that is a non-constant polynomial with no root. Then

is easily shown to be a bounded entire analytic function. Since it is not constant, this is a contradiction.

Of course there is a Great Picard Theorem, and the same Google hits found me a statement of it with a nod to the Star Trek Captain Picard:

Theorem: Suppose a transporter has holographic capabilities and transports objects from the neighborhood of a black hole to complex places. Then can transport an object to any complex place desired, except possibly one. [statement altered by Ken and I]

In formal terms,

Theorem: If is an analytic function except for an essential singularity at a point , then in any neighborhood of , its range includes all complex numbers, with at most one exception.

The Picard Jump

One way to view the Little Picard theorem is that it shows there is a jump in behavior of analytic functions. An analytic function can take on all complex values, or it can miss one like does, but if it missed two values, then it misses all but one—it is constant. A huge “jump” in behavior: an analytic function takes on all but one value or it misses all but one. There is no analytic function that misses three values.

This behavior occurs elsewhere in mathematics. Here is a general type of statement, where is some non-negligible gap value:

Theorem: All objects have . However, if , then .

Note, because Picard’s theorems involve infinities they do not quite fit this structure. However there are whole areas of theory that study exactly this type of behavior—for instance, statements in property testing are of this kind.

A Version For Finite Fields

Mike Zieve is a number theorist at Michigan who drew strong hiring interest at Tech, but that is another story, for another time. I want to thank him for relaying a pretty result and proof of a theorem that is like Picard’s, but holds for finite fields.

The theorem is due to Daqing Wan, and the proof is due to Gerhard Turnwald. The result is:

Theorem: If is a degree- polynomial over the finite field , and , then .

Note this says, that if is a polynomial over a finite field, then either its range is all of the field, or it is quite a bit smaller. For the history of the problem and a proof that the bound is sharp check the following paper.

Let’s compare Wan’s theorem to Picard’s Little Theorem:

Here is Mike’s version of Gerhard Turnwald’s proof of Daqing Wan’s theorem.

Proof: First replace by , so that we may assume has no constant term. (Note this simple use of a lever.) This clearly does not change the number of values that takes. Define by:

For , the coefficient of in is times the elementary symmetric polynomial in the ‘s. Thus it is a symmetric polynomial in the ‘s of degree that has no constant term. Hence we claim it is zero if . To see this, note that , so the elementary symmetric polynomial in the ‘s has value for , whence the claim follows from the symmetric polynomial theorem.

Thus has degree at most . Hence also has degree at most , since we may assume . But is not the zero polynomial (since ), and every element of is a root of , so .

Further developments may be found in these slides by Wan.

Farbod Shokrieh pointed out that there are many deep connections between Picard’s theorem and modern number theory. He said:

To understand the distribution of values of entire (and meromorphic) functions, the Finnish mathematician Rolf Nevanlinna developed a theory in 1920′s which is now called Nevanlinna theory. It can be considered as a generalization of Picard’s theorem.

It has been observed by Charles Osgood and Paul Vojta that Nevanlinna theory has many analogues in number theory (e.g. Roth’s theorem). Based on the analogies, Vojta has formulated a number of striking and far-reaching conjectures which would imply a wide range of results, known and unknown, in number theory. These include Fermat’s Theorem—not the “little” one and the ABC conjecture. See here for more details.

Clearly plenty to talk about another day.

Open Problems

Can we use the finite field theorem to prove some complexity results? I am especially interested in what happens to polynomial maps . Is there a Picard jump here? And if not when is there one?
History  People  Proofs  analytic  Diophantine  Fermat  number_theory  Picard  from google
september 2011 by josephzizys
Ask Bob
A new game that helps us understand the structure of .

Dieter van Melkebeek is a theorist at the University of Wisconsin at Madison. He is a past winner of the prestigious 1999 ACM dissertation award—I believe no one has ever won this award more than once. He has proved many wonderful theorems and is one of the world experts in complexity theory.

Today I want to talk about a beautiful new game that he has created for Alice and Bob. The game is based on , one of my favorite problems, as you may know.

The Bob in this game harks back to a vintage page of the ’90s. Vintage ’90s used to mean 1890′s, but now there is a museum of old webpages from the 1990′s. You can ask the Mystical Smoking Head of Bob a question, any question. He is all-powerful, but Alice wants to minimize her interaction with him.

I have had the pleasure to work on one project jointly with Dieter, and while that result was eclipsed by a result of Ryan Williams, I still consider our result a stepping stone for Ryan. Dieter just visited Georgia Tech and gave a talk in our theory seminar funded by the ARC center. It was a beautiful talk, about a great result, which I will now proceed to explain.

The Game for SAT

The game is played by two players: Alice and Bob, who else? They communicate back and forth, but only Alice sees the input, which is a set of -clauses: for example,

Some notation: will always stand for the number of variables and for the number of clauses.

I sometimes wonder whether theory research would stop if the letter “” were banned from our papers? In number theory a complex number is usually denoted by

and I believe that this notation is due to the eminent Bernhard Riemann himself. In his famous paper that introduced the zeta function, he wrote for a complex number on his famous line . Ken finds this notation macaronic for mixing Latin and Greek, when might seem more logical, but some have joked that Riemann used this notation because he forgot what Greek letter comes after .

The title of Riemann’s paper is: “Uber die Anzahl der Primzahlen unter einer gegebenen Grösse, or in English: On the Number of Primes Less Than a Given Magnitude. See here for a translation of the beautiful paper that changed number theory forever.

Back To The Game

Again the game is based on giving Alice a set of -clauses:

Recall is the number of clauses and these clauses are over the variables .

Her task is to determine whether or not they are satisfiable, which means that there is some assignment to the variables of boolean values , so that each clause evaluates to .

As is often the case one player, in this case Alice, must run in polynomial time. Since there is no known algorithm that always runs in polynomial time for , Alice has a tough task. So far the game seems pretty unfair, and pretty boring—its just the satisfaction problem. Nothing new.

But wait: she has the ability to ask Bob for help. Bob, unlike Alice in this game, is all-powerful: Bob can compute or determine anything that he wishes. Alice is allowed to ask questions by sending messages to Bob, who in turn sends back the correct answer—there is no issue of Bob being untruthful or cheating. Bob, in this game, is always going to answer the questions sent by Alice correctly.

That is the new game introduced by Holger Dell and Dieter in their paper. Pretty neat, no? The game so far is still uninteresting, we need to add one ingredient that makes it exciting:

Alice is charged by the number of bits she sends to Bob, and her task is to send as few bits as possible.

Now the game is interesting, very interesting. Without this restriction Alice could simply send all the clauses

to Bob, who would decide whether or not they were satisfiable, and then return one bit to Alice. With this restriction the game is much harder for Alice, and it is completely unclear what she should do? How many bits does she need in worst case to send to Bob? Note: Alice is only concerned with the length of the messages she sends to Bob, there is no restriction on the length of messages from Bob. Of course, Bob cannot send more than a polynomial number of total bits to Alice, since otherwise she would not run in polynomial time—just reading the messages would take too long.

Strategies For Alice

What does Alice do? In the initial version of the game—the free version—Alice is not allowed randomness. In the advanced version she is allowed to use randomness, but for now let’s restrict our attention to strategies that are deterministic.

Alice could just send Bob all the clauses, but encoded cleverly. There are a total of

possible clauses. So Alice could send Bob a bit vector of this length with a in the position if she has that clause. This is better than the naive encoding, but in worst case can be order . Can Alice do better?

Alice could start to send clauses to Bob, one at a time. Each time she asks him whether or not the clauses he has seen so far have more than a polynomial number of satisfying assignments. If they ever have only a polynomial number, then Bob will send all of them to Alice. She then can check the rest of the clauses. A reasonable strategy: Alice is being conservative and not sending all the clauses at once. But it is not hard to create examples where this will require her to send all the clauses to Bob. Thus it will send over bits to Bob, actually order , so it is worse than the first strategy.

Alice can allow Bob to take the lead. For example, Bob could send Alice a polynomial size string with Kolmogorov complexity near the length of . Alice, being polynomial time limited, has no hope of constructing such a string, but Bob is all powerful and can send her such string. The work of Eric Allender, Harry Buhrman, Michal Koucky, Dieter, and Detlef Ronneburger show that having access to such strings can be very powerful. But it turns out that even with this help, Alice cannot determine the state of her clauses in polynomial time.

Alice could Let’s stop here. You can think of your own strategies that are better than the ones that I suggested.

The Theorem

Dell and Dieter prove in their paper the following beautiful theorem:

Theorem: For any , there is no strategy for Alice and Bob that allows Alice to send, in worst case, fewer than bits of messages, unless the polynomial time hierarchy collapses.

Thus, up to an “epsilon” Alice’s best strategy is the simple one: send all the clauses to Bob, let him decide if they are satisfiable and wait for the answer to come back from Bob.

I find this surprising. It seems that there should be a better strategy for Alice, yet the theorem shows that no matter how she twists and turns, she must send about the same number of bits as the naive strategy. A pretty remarkable theorem. I will not attempt to describe the proof for two reasons. The usual: see their paper for the details—they explain it much better than I can. And the real reason—I do not yet understand the proof well enough to outline the main ideas. Perhaps I can do that in the future.

Open Problems

It the bound tight? Can the lower bound be improved? Or is there a strategy that slightly beats the naive one? Another very pretty open question is what happens when Alice is allowed randomness and the ability to make two-sided errors. That is Alice can be wrong either when she says yes or no: she must of course be right most of the time. Dieter and I discussed this question and it seems quite interesting.

Finally, there are two ways to view the game. One is that it shows that is incompressible, and the other is that the game could be a pathway to a proof that the polynomial hierarchy really does collapse. Of course the conventional wisdom is that the hierarchy is strict, but as I have said many times before: who knows? What Dieter has supplied is an intriguing new approach to show the hierarchy does indeed collapse: find a strategy for Alice that sends only total bits. Good hunting.
P=NP  People  Proofs  Algorithms  Alice  award  Bob  complexity  game  P≠NP  from google
september 2011 by josephzizys
Can Quantum Machines Do It All?
What keeps SAT out of BQP?

Dan Boneh is an expert on cryptography, especially the use of advanced number theory, such as the Weil pairing. He is famous for his seminal work on identity based encryption with Matt Franklin, which uses the Weil pairing.

Today I want to recall an old result Dan and I proved years ago and mix in a new result.

When Peter Shor announced in 1994 his instantly famous result that there is a quantum factoring algorithm that runs in polynomial time, many of us were amazed. We quickly read his paper, tried to understand why it worked, and followed the proofs step-by-step. It was—is still—a beautiful result.

Yet there was something about the result that troubled me. Perhaps no one else was troubled, but I did not really understand why the theorem was true. Yes I could follow the steps, yes I could explain the result to my students, but I did not really understand it.

I have discussed this phenomenon before: proofs are not just a mechanical means to move a statement from the conjecture pile to the proved pile. No, proofs are much more—they need to convey why the result is really true. They need to not seem magical, which Peter’s proof seemed to me. Proofs need to explain what is really happening.

Now I will say that Scott Aaronson’s well-noted explanation does this for the central idea, but back then Dan and I had no Scott to guide us. So we did something else.

In order to understand what was going on, Dan and I decided the best way would be to try and generalize Peter’s theorem. This is a standard method in mathematics: one way to really see what is happening is to generalize a theorem. Doing this can often shed new light on why the original theorem is true, and also can be interesting in its own right.

I could give countless examples of this method, it is used so often in math. Here is an example from number theory that is later explained better—I believe—by a classic result from group theory. Consider Fermat’s Little Theorem:

Theorem:
If is a prime and is an integer not divisible by , then

There are many proofs of this fundamental theorem. See our friends at Wikipedia here for several proofs: They have proofs based on: counting necklaces, using modular arithmetic, using the binomial theorem, and using dynamical systems.

Each proof gives a different insight into why must be prime, why cannot divide , and why the theorem is true.

The proof that I personally feel best explains what is happening is the one based on Lagrange’s Theorem:

Theorem: In any finite group, the order of an element divides the order of the group.

As an aside Joseph Lagrange did not prove his theorem, but only discussed special cases in 1771. It was finally proved by Pietro Abbati and published in 1803. The proof of Fermat’s Little Theorem now follows since when is prime, the set of numbers

form a group under multiplication modulo .

Hidden Linear Functions

Dan and I began to look at what we called the hidden linear problem. A function from has period provided

for all . We say that is m-to-one provided at most integers in the set map to the same value in . Note, if is one-to-one, then it is m-to-one with .

Let be a function from the integers to some arbitrary set . Say that f has hidden linear structure over provided there are integers and some function of period so that

for all integers . Further we say that is m-to-one provided is.

Our conjecture, which eventually became a theorem is:

Theorem:
Suppose that is a function that has a hidden linear structure over which is at most m-to-one. Assume the conditions:

Let , so that and are at most polynomial in .

Let be the smallest prime divisor of , so that .

For such a function , in quantum polynomial time in we can recover the values of modulo from an oracle for .

Note that the added conditions are critical. If the function were constant then there would be no way to reconstruct the ‘s: there must be some limit on how much it collapses. The second condition makes the values of the ‘s make sense modulo .

Look at our paper for details. The above theorem is strong enough to prove both factoring and discrete logarithm are in polynomial quantum time. This already indicated that we had made some progress. In Shor’s paper he uses different but similar arguments to handle factoring and the discrete logarithm. I personally thought that our proof helped me have a greater understanding of Shor’s brilliant work.

By the way, proving this theorem was hard. We used the same type of quantum algorithm that Peter did. But the fact that was not one-to-one created serious difficulties for us. Sometimes relaxing from a constant to `polynomial’ is a walk in the park, but not this time… I recall many times that one of us wanted to give up, but eventually we found the key trick. In the one-to-one case—the case that occurs in Peter’s theorems—there is an exponential sum that must be estimated. This sum is quite easy to handle. However, in the poly-to-one case the sum that arose was much harder to bound. We finally found a trick: when trying to bound a sum, change the sum. More on this another time.

Quantum Detection of Periods

Dan and I could also prove the following theorem:

Theorem:
Suppose that has period , and no smaller period. Also let be m-to-one. Add the conditions:

Let , so that and are at most polynomial in .

Let be the smallest prime divisor of , so that .

For such a function , in quantum polynomial time in we can recover the the period .

In Shor’s paper this theorem is essentially proved with : the so called bijective case. Obviously our result is a modest generalization to the case where the function can fail to be one-to-one, but a value in the range can only have a polynomial number of pre-images. Since then there have been much better improvements to the period solving problem. The current best results can recover the period provided the function has small auto-correlation—see this for details. Indeed for quantum algorithms that only use as an oracle, the current results are essentially best possible.

SAT

There is another reason that there may be no quantum algorithm for solving the general period finding problem. That is, given a small circuit—not an oracle—for the function , find its period. The reason is that the following is a theorem:

Theorem: The following two problems are equivalent via a random reduction:

Finding an assignment to a instance.

Finding the period of a Boolean valued function given by a small circuit.

Note, the lower bounds of quantum complexity theory are mostly based on a black box argument—they assume that only oracle access is available to the function whose period you want. There is no way known to show that they apply when the function is given by a circuit. This follows since if one could guess the period.

This will be an example of “evolving” a theorem from a trivial beginning to more-meaningful forms. At issue first are: how small is `small’ and what is the time on the reduction? Then the main issue will become, how close are the resulting cases of period-finding to ones that quantum algorithms can handle?

First we prove a version where `small’ means polynomial, the reduction is deterministic polynomial time, and the cases of periodicity involved are trivial. Namely, given an instance formula for , let be the circuit that on input an assignment outputs . If is not satisfiable then the function computed by is identically zero and so has period . If is true on the all- assignment then we say yes. Else, is satisfiable, so but for some other , so does not have period . It may not have any period, which is what makes this trivial. But we certainly know has period .

Amplifying Periods

Now we want always to have a period, in the case where . This can be done by a simple padding trick. Let us use some number of variables defining a value , and define the circuit by . Then the function always has a period dividing —the question is whether that period is or something larger.

Thus finding the period of a periodic function given by poly-size circuits is -hard. Of course this is far from bijective—it is just 0-1 valued. If we could make bijective on this period, at least when the formula is satisfiable, then we would put into . How close can we come?

We start by arranging that when , with high probability at least one we run across has a period of some prime order , where finding for this is enough to say . Take the above circuit ; we wish to see if it has an input so that . We can assume that is unique, if it exists by using the famous result of Leslie Valiant and Vijay Vazirani.

A further simple random argument shows that we can also assume that the solution is a prime number. Here’s why: Consider flipping each input bit of independently: since the proportion of primes up to is order-of , this will map the to a prime number with reasonable probability, and we only need that our test succeeds on it once. In summary we can assume that is either unsatisfiable or has some prime as the unique solution.

Now let be the function

where is over all prime divisors of . Then it is easy to see the following: If is unsatisfiable, then is identically and has period . If there is a unique prime so that , then has period . Therefore we have reduced solving to period finding—where we can restrict attention to positive cases where the period has prime length and the function is except for a on the period start.

There is a small point, really a BIG point: the function does not have a poly-size circuit. In order to compute it we must factor into it prime factors and then check each one, , to see it makes . However, factoring, currently, has algorithms that run in ti[…]
P=NP  People  Proofs  Algorithms  new  periods  quantum  SAT  from google
september 2011 by josephzizys
Organic Separations
A whole-earth catalog of separation notions, including “fooling”

Jack Lutz of Iowa State is a complexity theorist who is an extremely original thinker. He is most famous for his creative notions that make concepts from continuous spaces make sense in discrete settings. Foremost is his work on a generalization of Lebesgue measure to complexity classes.

Today Ken and I want to talk about a recent paper on fooling formulas.

I still recall first hearing about Lutz’s work. It seemed amazing that notions like measure could be extend in a meaningful manner to complexity classes, to make precise statements like

Almost all sets in complexity class X have property Y.

Jack’s work has branched from measure into fractal-dimension theory, and continues as an active area that is especially useful in core complexity theory. Here we see how it furnished one seedbed for issues of de-randomization, and suggests how to plow the adjacent field of “fooling” lower-complexity objects.

The Farm

The central problem in complexity theory is comparing our imagination with reality. On the imagination end we can invent an almost endless supply of complexity classes. A class is often, but not always, defined by limiting some resource: time, space, random bits, and so on. These can be used individually to define , or in combination to define the class of problems solvable by algorithms that run in polynomial time and poly-log space—the latter is Steve’s Class named after Stephen Cook.

Things are even more involved. Some classes are defined by a protocol that can be as “simple” as and , the classes created brilliantly by Laci Babai, or they can be a multiple-step protocol such as .

Defining complexity classes is fun: take a little of this, some of that, and add a pinch of this, and you have a complexity class. New ones are still being defined.

Christos Papadimitriou once defined a class of functions called PPAD that is quite interesting, but it came to great prominence when Xi Chen and Xiaoti Deng proved that computing a Nash equilibrium for a 2-player game is complete for it.

The ability to define classes and be limited only by our imagination is cool. Of course there are two constraints. First, some classes are more important than others. The classes and are of central importance, while a class defined by an exotic combination of concepts may be of interest only to those who defined it.

The second is that classes must be grounded in reality. As they are defined and created the obvious question arises: Is this class the same as a previously defined class? This is the essence of the question. Many of the great results in complexity theory have been of the form: this class equals that class , or is contained (surprisingly) in some class .

.
.
, where the latter is more obviously closed under union and intersection than the former.

To paraphrase George Orwell’s Animal Farm, the trouble is that “some classes are more equal than others”—or at least pretend to be equal to others. Separating out the pretence is needed to tell them apart. A word on “pretence:” no pretense is meant, this is based on Ken’s British training—you can replace “pretence” by “pretense” if you wish. Or not.

Extrinsic and Organic Separation

There seem to be two fundamental ways to tell whether two differently-defined mathematical objects and are truly different. One is to show that and cannot have the same extension, i.e., cannot denote the same object. The other is to show that has some property that does not have. The former we call an “extrinsic” separation.

When and are sets, the former can mean exhibiting an element in that does not have, or vice-versa, or at least showing that having the same elements leads to a contradiction. What could be more natural than this? Several issues creep in, however. For one, we could have as sets but and are the same in some sense like isomorphism. For another, we encounter the problem of judging sameness recursively for the elements themselves. Third, especially when we obtain a bare contradiction, we may not learn very much from the fact of separation.

With the second way we learn something in terms of properties that speak directly to the idea of difference, generally without involving any kind of recursion. When the properties involve the ideas behind how and are defined, we call this latter way an organic separation.

An Example From Mathematics

Parts of mathematics also struggle with proving that one object is not the same as another object. It is “obvious” that Euclidean space is different as a topological space from provided , but proving this is not trivial. As sets they have the same cardinality, so there is a 1-1 correspondence between their members. In physics the holographic principle gives a sense in which they may encode the same information.

Here is a simple organic way of proving that is different from . The property is whether the space always stays connected after the removal of a single point. has it but does not. That connectedness supplies the difference is not obvious, but it is illuminating.

The best proof for versus in our opinion involves the organic notion of dimension. Dimension theory attaches a number to each topological space.

To see that dimension is organic, note that the covering dimension of a topological space is defined to be the minimum value of , such that every finite open cover of admits a finite open cover of in which no point is included in more than sets. If no such minimal exists, the space is said to be of infinite covering dimension. For versus we don’t need to try to generalize the proof for versus by removing an -dimensional subspace—the property of dimension already separates them.

Examples for Complexity Classes

In complexity theory there are classes that we can prove different by both intrinsic and extrinsic methods. For example, suppose that you wish to prove that regular languages are not the same as context free languages (CFL’s). The classic extrinsic proof is to show that a language such as

is a CFL but not regular. An organic proof is to note that the regular languages are closed under intersection and CFLs are not. One way to show the latter is that if CFL’s were they closed under intersection, then the emptiness problem for CFLs would be undecidable. The point is that one can encode any computation into the intersection of two CFLs.

We think at some level these proofs are fundamentally different:

One shows that regular languages are too weak to contain all CFLs.
The other shows that CFLs are too powerful to equal the regular languages.

Does this suggest any ideas on how possibly to separate and ? An “old-hat” but still mysterious result is that is different from linear space, . The organic difference is that is closed downward under polynomial-time reducibility, while is not. It is mysterious because the proof does not tell us a language in one class and not the other. Indeed neither nor has been disproved.

The technical-minded may note that every organic proof can be re-phrased as equality leading to a contradiction, which we allowed as “extrinsic,” and may prefer the issue be framed as whether or not a class separation is constructive in the sense of showing an explicit language in one class and not the other. However, this misses the emphasis on properties—ones that may not be obvious but that deepen understanding once their connection to the definitions is perceived.

Lutz’s Methods

The impact of Jack Lutz’s work is that he provided several new farms for organic properties of complexity classes. The main one is his original notion of resource-bounded measure, and its later companion is, yes, a resource-bounded notion of dimension. For example, he defined the following property:

Class has poly-time measure zero in class .

Note that if has a complete set, such as , exponential time, and if is a class like that is closed downward under polynomial-time reducibility, then having -measure zero in will imply an explicit separation, since any -complete set will lie outside .

However, the point is that the property comes first. Lutz and his co-authors showed that his hypothesis that does not have -measure zero in implies many other assertions that are commonly believed about . Russell Impagliazzo and Philippe Moser proved that the hypothesis de-randomizes , i.e., makes . Meanwhile Jin-Yi Cai, D. Sivakumar, and Martin Strauss had shown situations where a related hypothesis is false.

Note that the hypothesis being false—i.e., having measure zero in —is stronger than saying . It would say that is a really small subclass of —that it doesn’t put up much of a pretence of being equal to . Negating the hypothesis is stronger than saying an explicit -complete language does not belong to , and hence it can be used to derive more consequences. These on one hand, the above on the other hand—at least one side has to be true.

The point is that the organic hypothesis perhaps provides a more interesting dichotomy than the extrinsic ideas of versus .

Fooling

Another organic way to tell two complexity classes apart is to show that one can be “fooled” more easier than the other. By fool we mean that there is a pseudo-random source that one class thinks is essentially uniformly random and another thinks is not.

Andrej Bogdanov, Periklis Papakonstantinou, and Andrew Wan (BPW), in their paper to appear at FOCS 2011, give an explicit construction of a pseudorandom generator for read-once formulas whose inputs can be read in arbitrary order. For formulas in inputs and arbitrary gates of fan-in at most logarithmic the pseudorandom generator uses bits of randomness and produces an output that looks -pseudorandom to all such formulas. Recall that a f[…]
P=NP  People  Proofs  dimension  fooling  language  measure  separation  space  time  from google
september 2011 by josephzizys

Copy this bookmark:



description:


tags: