josephzizys + predicate_abstraction   1

Nondeterministic thoughts on nondeterminism
So if you've been hanging around lately, I've been writing posts where I think I'm talking about new ideas. (I'm not always correct.) This post, on the other hand, is definitely more a review-and-synthesis sort of post; mostly stuff I gleaned over the summer from reading up on Dijkstra's GCL, Ball et al.'s Automatic predicate abstraction of C programs, and a number of K. Rustan M. Leino's papers as synthesized for me by my advisors Aditya Nori and Sriram Rajamani at Microsoft Research India.

The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.

C programs and their abstractions

Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:

/* Example 1 */
1. if(x > 0) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;

The thing about a (fully-defined) C program is that it's deterministic - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of x is, say, 4, then the program will execute line 1 (with x=4), then line 2 (with x=4), then line 6 (with x=9), and the program will then return 9. If the initial value of x is -12, then the program will execute line 1 (with x=-12), then line 4 (with x=-12), then line 6 (with x=10), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to streams of line numbers and states.

We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:

linenum = 1;
checkpoint();
if(x > 0) {
linenum = 2;
checkpoint();
x += 5;
} else {
linenum = 4;
checkpoint();
x = -x - 2;
}
linenum = 6;
checkpoint();
return x;

Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special checkpoint() function; the "line number" part of the stream is just handled by the value associated with linenum in the memory state.

So that's the meaning (the denotation) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.

Nondeterminism

Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to sets of streams of states,1 and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.

The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn if branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write if(*). Here's an example:

/* Example 2 */
1. if(*) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;

The *, near as I can tell, was chosen to resemble the "wildcard" symbol, since any boolean expression we put in that if statement (such as x > 0 to recover our Example 1) results in a program that refines Example 2. (Terminology: a program refines another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)

Assume/impossible

Nondeterministic choice allows us to enlarge the meaning of a nondeterministic program. Inserting assume() statements will cut down the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine assume(false) - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:

/* Example 3 */
1. assume(x > 0);
2. return x;

For every initial state where the value of x is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of x. For every initial state where the value of x is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.

Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of all of the if statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.2

/* Example 4 */
1. if(*) {
assume(x > 0);
2. x += 5;
3. } else {
assume(x <= 0);
4. x = -x - 2;
5. }
6. return x;

The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1.

On the other hand, if we remove the statement assume(x <= 0) after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the assume(x <= 0), but so do Examples 5 and 6:

/* Example 5 */
1. x = x;
4. x = -x - 2;
6. return x;

/* Example 6 */
1. if(x > 100) {
2. x += 5;
3. } else {
4. x = -x - 2;
5. }
6. return x;

Example 4 as it was presented shows how assume() together with nondeterministic choice can encode normal if statements. We could also define an statement impossible and say that a stream just cannot ever reach an impossible statement. Impossibility can be defined in terms of assume - impossible is equivalent to assume(false). Alternatively, we can use impossibility to define assumptions - assume(e) is equivalent to if(e) { impossible; }. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in impossible/if combination and use it to define assume() or whether we should build in assume() and use it to define impossible and if statements. It probably doesn't matter.

Assert/abort

In "Automatic predicate abstraction of C programs," assert() statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression e in an assert(e) statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.

So assert() represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program assert(false); while(true) {} is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.

Analogy: assert() is to abort as assume() is to impossible - we can define assert(e); as if(e) { abort; }.

Abstracting loops

The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called havoc(x), which allows the value associated with the variable x to be changed in any way. In other words, a program with the statement havoc(x) is refined by the program where the statement havoc(x) is replaced by the statement x = 4, the statement x = x - 12, the statement x = y - 16, or even the statement if(z) { x = y; } else { x = w; }.

Given the havoc primitive, imagine we have a program with a loop, and no checkpoints inside the loop:

1. /* Before the loop */
while(e) {
... loop body,
which only assigns
to variables x1,...,xn ...
}
2. /* After the loop */

Say we know the following two things:

The loop will always terminate if the expression e_inv evaluates to true at line 1, and
From any state where e_inv and e both evaluate to true, after the loop body is run, e_inv will evaluate to true.

Then, we know the program above refines the following program:

1. /* Before the loop */
assert(e_inv);
havoc(x1); ... havoc([…]
approximation  predicate_abstraction  imperative_programming  from google
september 2011 by josephzizys

Copy this bookmark:



description:


tags: