approximation 108
[1108.1320] Compressed Matrix Multiplication
9 weeks ago by Vaguery
"Motivated by the problems of computing sample covariance matrices, and of transforming a collection of vectors to a basis where they are sparse, we present a simple algorithm that computes an approximation of the product of two n-by-n real matrices A and B.…"
approximation
algorithms
nudge-targets
9 weeks ago by Vaguery
The Meaning of Omega
11 weeks ago by josephzizys
And the meaning of the recently-broken metric TSP approximation ratio
Andrew Stothers wrote in 2007 a first-year qualifying report for the University of Edinburgh PhD programme. In only eleven pages he presented the parts of the famous paper by Don Coppersmith and Shmuel Winograd (CW) that do enough to obtain for the exponent of matrix multiplication. In a few more pages he surveyed recent work on sufficient conditions for . Page 15, the last except for references, laid out “Possible Directions for Future Work” in five sentences, the fifth being:
Finally, it will also be possible to investigate bilinear algorithms further and use more complex means to determine whether or not “better” algorithms exist for matrix multiplication.
Today we wish to explain what means, why lower is always “better,” why we feel our field is better for the great efforts expended by Stothers and Virginia Vassilevska Williams, and why it is better when intellectual values guide our judgment. We also wish to defend ourselves and others for calling the results “breakthroughs,” comparing and contrasting to recent advances on algorithms for the Traveling Salesman Problem.
I (Ken) wrote a similar first-year qualifier in summer 1982 after my first year at Oxford—they are a staple of leading British universities. Like Andrew’s it was not obligated to have original research, just to demonstrate understanding and capability. One item found its way into a conference paper, but my 1986 dissertation ranged into various other topics.
What distinguishes Andrew’s Nov. 2010 dissertation is its singular focus on accomplishing the objective of his proposal. And doing it. The resulting paper with his advisor Sandy Davie has recently been submitted. Meanwhile in North America, Virginia had been thinking about the same problem since 2005. It may not be true as asserted here that “hundreds of people tried to improve” it after 1987, but we’ll guess about as many people thought about it. It’s enough to note that many researchers felt it worthy of pursuit, including Andrew’s advisor, and that progress took large concerted effort. The nature of this effort—including the use of computers—may be an important harbinger for science by itself.
What Does ω Mean?
The definition of is the infimum of all such that matrix multiplication has an algorithm that runs in time . This is in a classical model of computation where addition and multiplication of scalars have unit cost. Note that there might not be an algorithm that runs in time itself— may properly be a limit even if it turns out to have the least possible value 2. Currently all we know is that from Virginia’s work.
As many have noted, does not have immediate practical relevance because there is evidently a tradeoff with the constant hiding under the in . The tradeoff is so far steep enough that apparently only Volker Strassen’s relatively simple algorithm achieving an exponent of has wide use. So why do we care about values of less than that?
One of the central questions in physics is, why do the fundamental constants of nature have the values they do? The desire for why, in the face of evidence that the values could be arbitrary, has aroused such passion that string theorist David Gross channeled Winston Churchill in exhorting young physicists to “never, never, never give up.”
Now in mathematics it may seem nonsensical to ask, why do numbers like and have the values they do? But with , we are in a sense closer to physics: understanding it is akin to discovering a natural law, at least a law of information. And quantum mechanics has if anything magnified William Hamilton’s vision approaching 200 years ago that Nature computes with matrices, yes large ones. It is possible that detracts from a better statement of important laws, but knowing better brings us closer to them all the same.
Barriers and Breakthroughs
Hence also the interest in whether takes a value with known meaning, such as or (both falsified) or , or as some have mused, or , or as some have argued more likely for exponents, a logarithm of some higher and simpler number. Strassen himself has often stated his belief that is strictly greater than . If it has a value not previously seen in mathematics, then we could hope to discover new mathematical regularities as well; if it has a known value, then this may yield some more explanation about algorithms.
The value seemed a natural possible barrier, but the time interval from to in 1981–82 was very short. By contrast the “CW” bound of was not a natural-seeming number, but it withstood attempts to scale it for 23 years, including a year’s work from each of Virginia and Andrew. Reasons of intellectual judgment, expanded below, we feel are enough to justify calling their final results a “breakthrough.” But we offer a recent concrete case for comparison first.
The Salesman Always Rings 1.5
The Traveling Salesman Problem (TSP) seeks the shortest route to visit each of a given set of sites, coming back in a ring to the starting site. When the sites are in real space and “shortest” refers to Euclidean distance, finding the length of the absolute shortest route remains -hard even in the plane, as discussed here. However, Nicos Christofides in 1976 found a polynomial-time algorithm that finds a tour guaranteed to have total length at most . The method and proof apply to various other cases where the distances between sites obey inequalities that define some kind of metric, with the same bound.
This stood as a barrier for most of these cases, until this year. Shayan Oveis Gharan, Amit Saberi, and Mohit Singh (OSS) broke it for a wide subclass of these problems. Well they “broke” it by achieving a guaranteed ratio to the optimum of
Actually their original paper did not give a value—it merely proved the existence of an giving . The estimate for the magnitude of their breakthrough was supplied later.
Improvements and Predictions
It must be said that the OSS paper and its predecessor introduced techniques that were more novel to the problem than is evident for the new matrix-multiplication papers, and the predecessor won the SODA 2010 “Best Paper” award. However, a referee could have wondered, why bother moving heaven and earth in a restricted case to demonstrate an increment a thousand times smaller than “nano”? Indeed the version of OSS linked above still ends with “” on page 65.
What happened is that the change attracted interest on several continents, and the people in Europe who made a bankable improvement are not even in the euro zone. First Tobias Mömke and Ola Svensson of KTH in Stockholm obtained by using matchings in place of the more-complicated ingredients of OSS, also in time for FOCS 2011. Then Marcin Mucha of the University of Warsaw improved their analysis to obtain
It is noted by all that an even older paper by Michael Held and Richard Karp from 1970 had set a believable target for provable improvements of Christofides’ bound, namely a conjectured ratio of from linear programming relaxations. However, also has dueling conjectures, some supporting a believable target of . The real point of comparison with TSP is that suddenly there was progress on a barrier that had stood for much of the age of the field, and this attracted others to try for more.
We note that Markus Bläser has contended in a comment that the extension of CW used by both new papers has limitations, and we infer that some other experts concur. However, the paper by Vassilevska Williams in particular has two new ingredients: a framework for managing any tensor power as a base algorithm, and a computer implementation of the framework. She also employed an insight of Stothers to simplify the analysis. Though we can imagine a geometrical insight that higher powers bring returns that “shrink geometrically,” can we really constrain the likelihood that pursuing them might dislodge a different insight that changes the whole game? Is there a limitation theorem here? All we know is that the game has changed, and perhaps the game is afoot.
Worth and Values
The last issue we wish to raise is the reliability of judging the worth of past achievement by present assessment of what it may or may not lead to in the future. We have posted several times on surprises and the difficulty of predicting or guessing how things will go. Of course impetus into the future is necessarily part of claiming something now a “breakthrough.”
However, if we seek a reliable and consistent standard of judging worth, we should use a value that the community has understood for many years: intellectual substance and effort. This value, plus the simple salience of the goal, led our principals to invest the years of work in the first place. Depth of thinking is our gold standard, while applicability is our paper currency. The improvements of and on paper for amid other issues may be a poor return now, but the new computer-assisted vein to mine may parlay true value later.
Dick and I hope this explains why fundamental effort and easily-stated achievement, on a fundamental problem after nearly a quarter-century of stasis, elicited the reaction it has. We agree with, and have tried to extend, Timothy Gowers’ comments here. As for how these values can be invested, only time will be the teller.
Open Problems
Can we be more open about the value of pursuing problems?
What is ?
Suppose we know something special about two matrices and to be multiplied, something not so obvious like their being sparse. Suppose in particular that we have a tiny circuit that for any outputs the value , and similarly for , where those values come from a fixed finite set. Or suppose we know that and/or preserve a similar succinctness property from argument vectors to values—which[…]
All_Posts
Ideas
News
Open_Problems
People
Proofs
Results
Algorithms
approximation
matrix
multiplication
omega
TSP
from google
Andrew Stothers wrote in 2007 a first-year qualifying report for the University of Edinburgh PhD programme. In only eleven pages he presented the parts of the famous paper by Don Coppersmith and Shmuel Winograd (CW) that do enough to obtain for the exponent of matrix multiplication. In a few more pages he surveyed recent work on sufficient conditions for . Page 15, the last except for references, laid out “Possible Directions for Future Work” in five sentences, the fifth being:
Finally, it will also be possible to investigate bilinear algorithms further and use more complex means to determine whether or not “better” algorithms exist for matrix multiplication.
Today we wish to explain what means, why lower is always “better,” why we feel our field is better for the great efforts expended by Stothers and Virginia Vassilevska Williams, and why it is better when intellectual values guide our judgment. We also wish to defend ourselves and others for calling the results “breakthroughs,” comparing and contrasting to recent advances on algorithms for the Traveling Salesman Problem.
I (Ken) wrote a similar first-year qualifier in summer 1982 after my first year at Oxford—they are a staple of leading British universities. Like Andrew’s it was not obligated to have original research, just to demonstrate understanding and capability. One item found its way into a conference paper, but my 1986 dissertation ranged into various other topics.
What distinguishes Andrew’s Nov. 2010 dissertation is its singular focus on accomplishing the objective of his proposal. And doing it. The resulting paper with his advisor Sandy Davie has recently been submitted. Meanwhile in North America, Virginia had been thinking about the same problem since 2005. It may not be true as asserted here that “hundreds of people tried to improve” it after 1987, but we’ll guess about as many people thought about it. It’s enough to note that many researchers felt it worthy of pursuit, including Andrew’s advisor, and that progress took large concerted effort. The nature of this effort—including the use of computers—may be an important harbinger for science by itself.
What Does ω Mean?
The definition of is the infimum of all such that matrix multiplication has an algorithm that runs in time . This is in a classical model of computation where addition and multiplication of scalars have unit cost. Note that there might not be an algorithm that runs in time itself— may properly be a limit even if it turns out to have the least possible value 2. Currently all we know is that from Virginia’s work.
As many have noted, does not have immediate practical relevance because there is evidently a tradeoff with the constant hiding under the in . The tradeoff is so far steep enough that apparently only Volker Strassen’s relatively simple algorithm achieving an exponent of has wide use. So why do we care about values of less than that?
One of the central questions in physics is, why do the fundamental constants of nature have the values they do? The desire for why, in the face of evidence that the values could be arbitrary, has aroused such passion that string theorist David Gross channeled Winston Churchill in exhorting young physicists to “never, never, never give up.”
Now in mathematics it may seem nonsensical to ask, why do numbers like and have the values they do? But with , we are in a sense closer to physics: understanding it is akin to discovering a natural law, at least a law of information. And quantum mechanics has if anything magnified William Hamilton’s vision approaching 200 years ago that Nature computes with matrices, yes large ones. It is possible that detracts from a better statement of important laws, but knowing better brings us closer to them all the same.
Barriers and Breakthroughs
Hence also the interest in whether takes a value with known meaning, such as or (both falsified) or , or as some have mused, or , or as some have argued more likely for exponents, a logarithm of some higher and simpler number. Strassen himself has often stated his belief that is strictly greater than . If it has a value not previously seen in mathematics, then we could hope to discover new mathematical regularities as well; if it has a known value, then this may yield some more explanation about algorithms.
The value seemed a natural possible barrier, but the time interval from to in 1981–82 was very short. By contrast the “CW” bound of was not a natural-seeming number, but it withstood attempts to scale it for 23 years, including a year’s work from each of Virginia and Andrew. Reasons of intellectual judgment, expanded below, we feel are enough to justify calling their final results a “breakthrough.” But we offer a recent concrete case for comparison first.
The Salesman Always Rings 1.5
The Traveling Salesman Problem (TSP) seeks the shortest route to visit each of a given set of sites, coming back in a ring to the starting site. When the sites are in real space and “shortest” refers to Euclidean distance, finding the length of the absolute shortest route remains -hard even in the plane, as discussed here. However, Nicos Christofides in 1976 found a polynomial-time algorithm that finds a tour guaranteed to have total length at most . The method and proof apply to various other cases where the distances between sites obey inequalities that define some kind of metric, with the same bound.
This stood as a barrier for most of these cases, until this year. Shayan Oveis Gharan, Amit Saberi, and Mohit Singh (OSS) broke it for a wide subclass of these problems. Well they “broke” it by achieving a guaranteed ratio to the optimum of
Actually their original paper did not give a value—it merely proved the existence of an giving . The estimate for the magnitude of their breakthrough was supplied later.
Improvements and Predictions
It must be said that the OSS paper and its predecessor introduced techniques that were more novel to the problem than is evident for the new matrix-multiplication papers, and the predecessor won the SODA 2010 “Best Paper” award. However, a referee could have wondered, why bother moving heaven and earth in a restricted case to demonstrate an increment a thousand times smaller than “nano”? Indeed the version of OSS linked above still ends with “” on page 65.
What happened is that the change attracted interest on several continents, and the people in Europe who made a bankable improvement are not even in the euro zone. First Tobias Mömke and Ola Svensson of KTH in Stockholm obtained by using matchings in place of the more-complicated ingredients of OSS, also in time for FOCS 2011. Then Marcin Mucha of the University of Warsaw improved their analysis to obtain
It is noted by all that an even older paper by Michael Held and Richard Karp from 1970 had set a believable target for provable improvements of Christofides’ bound, namely a conjectured ratio of from linear programming relaxations. However, also has dueling conjectures, some supporting a believable target of . The real point of comparison with TSP is that suddenly there was progress on a barrier that had stood for much of the age of the field, and this attracted others to try for more.
We note that Markus Bläser has contended in a comment that the extension of CW used by both new papers has limitations, and we infer that some other experts concur. However, the paper by Vassilevska Williams in particular has two new ingredients: a framework for managing any tensor power as a base algorithm, and a computer implementation of the framework. She also employed an insight of Stothers to simplify the analysis. Though we can imagine a geometrical insight that higher powers bring returns that “shrink geometrically,” can we really constrain the likelihood that pursuing them might dislodge a different insight that changes the whole game? Is there a limitation theorem here? All we know is that the game has changed, and perhaps the game is afoot.
Worth and Values
The last issue we wish to raise is the reliability of judging the worth of past achievement by present assessment of what it may or may not lead to in the future. We have posted several times on surprises and the difficulty of predicting or guessing how things will go. Of course impetus into the future is necessarily part of claiming something now a “breakthrough.”
However, if we seek a reliable and consistent standard of judging worth, we should use a value that the community has understood for many years: intellectual substance and effort. This value, plus the simple salience of the goal, led our principals to invest the years of work in the first place. Depth of thinking is our gold standard, while applicability is our paper currency. The improvements of and on paper for amid other issues may be a poor return now, but the new computer-assisted vein to mine may parlay true value later.
Dick and I hope this explains why fundamental effort and easily-stated achievement, on a fundamental problem after nearly a quarter-century of stasis, elicited the reaction it has. We agree with, and have tried to extend, Timothy Gowers’ comments here. As for how these values can be invested, only time will be the teller.
Open Problems
Can we be more open about the value of pursuing problems?
What is ?
Suppose we know something special about two matrices and to be multiplied, something not so obvious like their being sparse. Suppose in particular that we have a tiny circuit that for any outputs the value , and similarly for , where those values come from a fixed finite set. Or suppose we know that and/or preserve a similar succinctness property from argument vectors to values—which[…]
11 weeks ago by josephzizys
[1111.5899] Sampling, Filtering and Sparse Approximations on Combinatorial Graphs
11 weeks ago by cshalizi
In this paper we address sampling and approximation of functions on combinatorial graphs. We develop filtering on graphs by using Schr"odinger's group of operators generated by combinatorial Laplace operator. Then we construct a sampling theory by proving Poincare and Plancherel-Polya-type inequalities for functions on graphs. These results lead to a theory of sparse approximations on graphs and have potential applications to filtering, denoising, data dimension reduction, image processing, image compression, computer graphics, visualization and learning theory.
to:NB
network_data_analysis
approximation
graph_theory
11 weeks ago by cshalizi
Another Annoying Open Problem
november 2011 by josephzizys
The k-server problem: some progress, still wide open
Aleksander Mądry gave a stellar talk at our ARC Theory Day a week ago Friday. He is an expert on algorithmic graph theory, among other areas of theory. Already he has multiple best-paper awards, including the paper of this talk from FOCS 2011, and I expect he will get more in the future. This is good going considering that basic LaTeX systems lack a macro for the Polish ogonek diacritic in his name—Ken installed a special package called tipa to get it while editing this post.
Today I would like to talk about his result and the open questions that remain.
He with Nikhil Bansal, Niv Buchbinder, and Seffi Naor (BBMN) have made a recent important contribution to our understanding of the complexity of on-line algorithms. Aleksander’s talk was wonderful and explained what is known, their new result, and what still remains to be done. As he said:
One of the remaining open questions is really annoying—it should be solved—but it continues to resist attacks.
Before I discuss their result I would like to thank the Theory Day organizers Prasad Tetali and Prasad Raghavendra. It was held on the magical day 11/11/11, which does not happen that often, and in fact Mądry’s talk spanned 11:11:11am. Ken wore a red shirt in Buffalo but could not find the WW I remembrance poppy pin he acquired during his sabbatical in Canada. The full scorecard of talks and titles was:
Thomas Dueholm Hansen: Subexponential lower bounds for randomized pivoting rules for the simplex algorithm.
Aleksander Mądry: Online algorithms and the k-server conjecture.
Mohit Singh: A Randomized Rounding Approach for Symmetric TSP.
Ryan Williams: Algorithms for Circuits and Circuits for Algorithms.
The Problem
The problem he spoke about is the now classic on-line server problem. The simplest case is the problem of managing a computer $. Note that this is the way to tell a theory talk on computer caches from an architecture talk on computer caches—in architecture often $ is used to denote a cache. Perhaps this is changing, given the current economic turmoil, perhaps not.
The key is that there are pages in the cache and total pages. Each time a page is requested, if it is not in the cache the task is to decide which page to “evict” from the cache. If the strategy is deterministic, then as Danny Sleator and Bob Tarjan proved back in 1985, the best strategy is only -competitive. This means that the best deterministic strategy could be as bad as times the best off-line strategy which is allowed to see all the page requests at once.
Randomization comes to the rescue. If the strategy for eviction can be random, then it is known that an on-line random strategy exists that is competitive. This is due to Amos Fiat, Dick Karp, Mike Luby, Lyle McGeoch, Danny Sleator, and Neal Young in 1991.
The paging problem is just a special case of the general -server problem. In the general case the simple cache mechanism is replaced by one based on an arbitrary finite metric space. The servers at any step are located somewhere on the points of the metric space. The requests are to “serve” a point in the space: a server must move to that point, and it incurs a cost of the distance in the metric.
In the 1990′s this problem was very “hot,” and there was a stream of results that tried to get good deterministic server strategies. The first results were exponential in but independent of : recall there are servers and the space has points. Finally—in a famous paper—Elias Koutsoupias and Christos Papadimitriou 1994 showed that there is a strategy achieving , which remains today the best known result.
The BBMN Result
The new result is:
Theorem 1 There exists an -competitive on-line strategy for the -server problem.
This is a great improvement, but it also changes the rules. It is great since the competitiveness bound is poly-log in . However, all the previous results we discussed had competitiveness bounds that did not depend on the size of the metric space, . Their result does.
See their paper for the proof. The key idea is that they: Reduce the -server problem over arbitrary metric to a (more difficult) problem over a very simple metric The very simple metric is a type of tree-like metric. The proof relies on the fact that any metric space on points can be well approximated by such a metric space. The cost of the approximation grows roughly as , so it is not surprising that their theorem has factors of in the competitiveness bound.
Open Problems
The -server problem has the following bounds and gaps in those bounds:
If the strategy is deterministic it must take and can be done in . There is a small, but annoying gap of two here. Which is correct?
If the strategy is randomized it must take . There is a huge and annoying gap to the known upper bounds here. There is no known strategy for even the simple case of the real line that is randomized but beats linear in . What is going on?
Finally, are the polylog factors of in their result necessary? In particular can one get rid of the dependence on altogether?
[fixed typo in Fiat et al. to O(log k)]
1
All_Posts
News
Open_Problems
Results
Algorithms
approximation
deterministic
lower_bounds
randomness
from google
Aleksander Mądry gave a stellar talk at our ARC Theory Day a week ago Friday. He is an expert on algorithmic graph theory, among other areas of theory. Already he has multiple best-paper awards, including the paper of this talk from FOCS 2011, and I expect he will get more in the future. This is good going considering that basic LaTeX systems lack a macro for the Polish ogonek diacritic in his name—Ken installed a special package called tipa to get it while editing this post.
Today I would like to talk about his result and the open questions that remain.
He with Nikhil Bansal, Niv Buchbinder, and Seffi Naor (BBMN) have made a recent important contribution to our understanding of the complexity of on-line algorithms. Aleksander’s talk was wonderful and explained what is known, their new result, and what still remains to be done. As he said:
One of the remaining open questions is really annoying—it should be solved—but it continues to resist attacks.
Before I discuss their result I would like to thank the Theory Day organizers Prasad Tetali and Prasad Raghavendra. It was held on the magical day 11/11/11, which does not happen that often, and in fact Mądry’s talk spanned 11:11:11am. Ken wore a red shirt in Buffalo but could not find the WW I remembrance poppy pin he acquired during his sabbatical in Canada. The full scorecard of talks and titles was:
Thomas Dueholm Hansen: Subexponential lower bounds for randomized pivoting rules for the simplex algorithm.
Aleksander Mądry: Online algorithms and the k-server conjecture.
Mohit Singh: A Randomized Rounding Approach for Symmetric TSP.
Ryan Williams: Algorithms for Circuits and Circuits for Algorithms.
The Problem
The problem he spoke about is the now classic on-line server problem. The simplest case is the problem of managing a computer $. Note that this is the way to tell a theory talk on computer caches from an architecture talk on computer caches—in architecture often $ is used to denote a cache. Perhaps this is changing, given the current economic turmoil, perhaps not.
The key is that there are pages in the cache and total pages. Each time a page is requested, if it is not in the cache the task is to decide which page to “evict” from the cache. If the strategy is deterministic, then as Danny Sleator and Bob Tarjan proved back in 1985, the best strategy is only -competitive. This means that the best deterministic strategy could be as bad as times the best off-line strategy which is allowed to see all the page requests at once.
Randomization comes to the rescue. If the strategy for eviction can be random, then it is known that an on-line random strategy exists that is competitive. This is due to Amos Fiat, Dick Karp, Mike Luby, Lyle McGeoch, Danny Sleator, and Neal Young in 1991.
The paging problem is just a special case of the general -server problem. In the general case the simple cache mechanism is replaced by one based on an arbitrary finite metric space. The servers at any step are located somewhere on the points of the metric space. The requests are to “serve” a point in the space: a server must move to that point, and it incurs a cost of the distance in the metric.
In the 1990′s this problem was very “hot,” and there was a stream of results that tried to get good deterministic server strategies. The first results were exponential in but independent of : recall there are servers and the space has points. Finally—in a famous paper—Elias Koutsoupias and Christos Papadimitriou 1994 showed that there is a strategy achieving , which remains today the best known result.
The BBMN Result
The new result is:
Theorem 1 There exists an -competitive on-line strategy for the -server problem.
This is a great improvement, but it also changes the rules. It is great since the competitiveness bound is poly-log in . However, all the previous results we discussed had competitiveness bounds that did not depend on the size of the metric space, . Their result does.
See their paper for the proof. The key idea is that they: Reduce the -server problem over arbitrary metric to a (more difficult) problem over a very simple metric The very simple metric is a type of tree-like metric. The proof relies on the fact that any metric space on points can be well approximated by such a metric space. The cost of the approximation grows roughly as , so it is not surprising that their theorem has factors of in the competitiveness bound.
Open Problems
The -server problem has the following bounds and gaps in those bounds:
If the strategy is deterministic it must take and can be done in . There is a small, but annoying gap of two here. Which is correct?
If the strategy is randomized it must take . There is a huge and annoying gap to the known upper bounds here. There is no known strategy for even the simple case of the real line that is randomized but beats linear in . What is going on?
Finally, are the polylog factors of in their result necessary? In particular can one get rid of the dependence on altogether?
[fixed typo in Fiat et al. to O(log k)]
november 2011 by josephzizys
[1111.0483] Optimally approximating exponential families
november 2011 by cshalizi
"This article studies exponential families $mathcal{E}$ on finite sets such that the information divergence $D(P|mathcal{E})$ of an arbitrary probability distribution from $mathcal{E}$ is bounded by some constant $D>0$. A particular class of low-dimensional exponential families that have low values of $D$ can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. Exponential families where $D=log(2)$ are studied in detail. This case is special, because if $D<log(2)$, then $mathcal{E}$ contains all probability measures with full support."
to:NB
exponential_families
probability
information_theory
approximation
november 2011 by cshalizi
[1107.0385] An algorithm for autonomously plotting solution sets in the presence of turning points
october 2011 by Vaguery
"Plotting solution sets for particular equations may be complicated by the existence of turning points. Here we describe an algorithm which not only overcomes such problematic points, but does so in the most general of settings. Applications of the algorithm are highlighted through two examples: the first provides verification, while the second demonstrates a non-trivial application. The latter is followed by a thorough run-time analysis. While both examples deal with bivariate equations, it is discussed how the algorithm may be generalized for space curves in $R^{3}$."
visualization
mathematics
graphics
approximation
algorithms
nudge-targets
october 2011 by Vaguery
Nondeterministic thoughts on nondeterminism
september 2011 by josephzizys
So if you've been hanging around lately, I've been writing posts where I think I'm talking about new ideas. (I'm not always correct.) This post, on the other hand, is definitely more a review-and-synthesis sort of post; mostly stuff I gleaned over the summer from reading up on Dijkstra's GCL, Ball et al.'s Automatic predicate abstraction of C programs, and a number of K. Rustan M. Leino's papers as synthesized for me by my advisors Aditya Nori and Sriram Rajamani at Microsoft Research India.
The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.
C programs and their abstractions
Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:
/* Example 1 */
1. if(x > 0) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
The thing about a (fully-defined) C program is that it's deterministic - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of x is, say, 4, then the program will execute line 1 (with x=4), then line 2 (with x=4), then line 6 (with x=9), and the program will then return 9. If the initial value of x is -12, then the program will execute line 1 (with x=-12), then line 4 (with x=-12), then line 6 (with x=10), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to streams of line numbers and states.
We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:
linenum = 1;
checkpoint();
if(x > 0) {
linenum = 2;
checkpoint();
x += 5;
} else {
linenum = 4;
checkpoint();
x = -x - 2;
}
linenum = 6;
checkpoint();
return x;
Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special checkpoint() function; the "line number" part of the stream is just handled by the value associated with linenum in the memory state.
So that's the meaning (the denotation) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.
Nondeterminism
Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to sets of streams of states,1 and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.
The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn if branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write if(*). Here's an example:
/* Example 2 */
1. if(*) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
The *, near as I can tell, was chosen to resemble the "wildcard" symbol, since any boolean expression we put in that if statement (such as x > 0 to recover our Example 1) results in a program that refines Example 2. (Terminology: a program refines another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)
Assume/impossible
Nondeterministic choice allows us to enlarge the meaning of a nondeterministic program. Inserting assume() statements will cut down the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine assume(false) - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:
/* Example 3 */
1. assume(x > 0);
2. return x;
For every initial state where the value of x is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of x. For every initial state where the value of x is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.
Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of all of the if statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.2
/* Example 4 */
1. if(*) {
assume(x > 0);
2. x += 5;
3. } else {
assume(x <= 0);
4. x = -x - 2;
5. }
6. return x;
The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1.
On the other hand, if we remove the statement assume(x <= 0) after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the assume(x <= 0), but so do Examples 5 and 6:
/* Example 5 */
1. x = x;
4. x = -x - 2;
6. return x;
/* Example 6 */
1. if(x > 100) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
Example 4 as it was presented shows how assume() together with nondeterministic choice can encode normal if statements. We could also define an statement impossible and say that a stream just cannot ever reach an impossible statement. Impossibility can be defined in terms of assume - impossible is equivalent to assume(false). Alternatively, we can use impossibility to define assumptions - assume(e) is equivalent to if(e) { impossible; }. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in impossible/if combination and use it to define assume() or whether we should build in assume() and use it to define impossible and if statements. It probably doesn't matter.
Assert/abort
In "Automatic predicate abstraction of C programs," assert() statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression e in an assert(e) statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.
So assert() represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program assert(false); while(true) {} is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.
Analogy: assert() is to abort as assume() is to impossible - we can define assert(e); as if(e) { abort; }.
Abstracting loops
The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called havoc(x), which allows the value associated with the variable x to be changed in any way. In other words, a program with the statement havoc(x) is refined by the program where the statement havoc(x) is replaced by the statement x = 4, the statement x = x - 12, the statement x = y - 16, or even the statement if(z) { x = y; } else { x = w; }.
Given the havoc primitive, imagine we have a program with a loop, and no checkpoints inside the loop:
1. /* Before the loop */
while(e) {
... loop body,
which only assigns
to variables x1,...,xn ...
}
2. /* After the loop */
Say we know the following two things:
The loop will always terminate if the expression e_inv evaluates to true at line 1, and
From any state where e_inv and e both evaluate to true, after the loop body is run, e_inv will evaluate to true.
Then, we know the program above refines the following program:
1. /* Before the loop */
assert(e_inv);
havoc(x1); ... havoc([…]
approximation
predicate_abstraction
imperative_programming
from google
The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.
C programs and their abstractions
Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:
/* Example 1 */
1. if(x > 0) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
The thing about a (fully-defined) C program is that it's deterministic - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of x is, say, 4, then the program will execute line 1 (with x=4), then line 2 (with x=4), then line 6 (with x=9), and the program will then return 9. If the initial value of x is -12, then the program will execute line 1 (with x=-12), then line 4 (with x=-12), then line 6 (with x=10), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to streams of line numbers and states.
We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:
linenum = 1;
checkpoint();
if(x > 0) {
linenum = 2;
checkpoint();
x += 5;
} else {
linenum = 4;
checkpoint();
x = -x - 2;
}
linenum = 6;
checkpoint();
return x;
Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special checkpoint() function; the "line number" part of the stream is just handled by the value associated with linenum in the memory state.
So that's the meaning (the denotation) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.
Nondeterminism
Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to sets of streams of states,1 and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.
The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn if branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write if(*). Here's an example:
/* Example 2 */
1. if(*) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
The *, near as I can tell, was chosen to resemble the "wildcard" symbol, since any boolean expression we put in that if statement (such as x > 0 to recover our Example 1) results in a program that refines Example 2. (Terminology: a program refines another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)
Assume/impossible
Nondeterministic choice allows us to enlarge the meaning of a nondeterministic program. Inserting assume() statements will cut down the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine assume(false) - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:
/* Example 3 */
1. assume(x > 0);
2. return x;
For every initial state where the value of x is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of x. For every initial state where the value of x is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.
Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of all of the if statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.2
/* Example 4 */
1. if(*) {
assume(x > 0);
2. x += 5;
3. } else {
assume(x <= 0);
4. x = -x - 2;
5. }
6. return x;
The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1.
On the other hand, if we remove the statement assume(x <= 0) after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the assume(x <= 0), but so do Examples 5 and 6:
/* Example 5 */
1. x = x;
4. x = -x - 2;
6. return x;
/* Example 6 */
1. if(x > 100) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;
Example 4 as it was presented shows how assume() together with nondeterministic choice can encode normal if statements. We could also define an statement impossible and say that a stream just cannot ever reach an impossible statement. Impossibility can be defined in terms of assume - impossible is equivalent to assume(false). Alternatively, we can use impossibility to define assumptions - assume(e) is equivalent to if(e) { impossible; }. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in impossible/if combination and use it to define assume() or whether we should build in assume() and use it to define impossible and if statements. It probably doesn't matter.
Assert/abort
In "Automatic predicate abstraction of C programs," assert() statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression e in an assert(e) statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.
So assert() represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program assert(false); while(true) {} is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.
Analogy: assert() is to abort as assume() is to impossible - we can define assert(e); as if(e) { abort; }.
Abstracting loops
The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called havoc(x), which allows the value associated with the variable x to be changed in any way. In other words, a program with the statement havoc(x) is refined by the program where the statement havoc(x) is replaced by the statement x = 4, the statement x = x - 12, the statement x = y - 16, or even the statement if(z) { x = y; } else { x = w; }.
Given the havoc primitive, imagine we have a program with a loop, and no checkpoints inside the loop:
1. /* Before the loop */
while(e) {
... loop body,
which only assigns
to variables x1,...,xn ...
}
2. /* After the loop */
Say we know the following two things:
The loop will always terminate if the expression e_inv evaluates to true at line 1, and
From any state where e_inv and e both evaluate to true, after the loop body is run, e_inv will evaluate to true.
Then, we know the program above refines the following program:
1. /* Before the loop */
assert(e_inv);
havoc(x1); ... havoc([…]
september 2011 by josephzizys
[1107.4353] Upper Bounds for Markov Approximations of Ergodic Processes
july 2011 by cshalizi
"Chains of infinite order are generalizations of Markov chains that constitute a flexible class of models in the general theory of stochastic processes. These processes can be naturally studied using approximating Markov chains. Here we derive new upper bounds for the $bar{d}$-distance and the K"ullback-Leibler divergence between chains of infinite order and their canonical $k$-step Markov approximations. In contrast to the bounds available in the literature our results apply to chains of infinite order compatible with general classes of probability kernels. In particular, we allow kernels with discontinuities and null transition probabilities."" (Pedantry: Pretty sure Kullback did not spell his name with an umlaut!)
markov_models
stochastic_processes
re:AoS_project
to_read
in_NB
approximation
re:your_favorite_dsge_sucks
july 2011 by cshalizi
Distribution Approximations
june 2011 by davidar
Various approximations for distributions are studied, especially those involving the Binomial, Poisson, gamma, and Gaussian (normal) distributions. m-procedures are used to make comparisons. A simple approximation to a continuous random variable is obtained by subdividing an interval which includes the range (the set of possible values) into small enough subintervals that the density is approximately constant over each subinterval. A point in each subinterval is selected and is assigned the probability mass in its subinterval. The combination of the selected points and the corresponding probabilities describes the distribution of an approximating simple random variable. Calculations based on this distribution approximate corresponding calculations on the continuous distribution.
probability
distribution
approximation
binomial
poisson
gamma
normal
gaussian
june 2011 by davidar
Copy this bookmark: