Avoid branches in CPROVER library for --paths mode#8971
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the built-in ANSI-C CPROVER library (stdlib.c) to avoid generating GOTO-level control-flow branches in --paths mode by replacing two if statements with conditional (?:) expressions. This targets path explosion by keeping the logic as expression-level conditionals instead of control-flow splits.
Changes:
- Rewrite
free’s memory-leak tracking (__CPROVER_memory_leak) update using a conditional expression. - Rewrite
__CPROVER_deallocate’s nondeterministic deallocation tracking (__CPROVER_deallocated) update using a conditional expression.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8971 +/- ##
===========================================
+ Coverage 80.68% 80.71% +0.03%
===========================================
Files 1713 1715 +2
Lines 189499 189834 +335
Branches 73 73
===========================================
+ Hits 152890 153232 +342
+ Misses 36609 36602 -7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
8c2786d to
0dc0f37
Compare
|
I am ok merging this, but we shouldn't need it. |
|
If we don't want to consider path merging, then an option could be to add a preprocessing pass. |
Replace if-statements with conditional expressions in the built-in
CPROVER libraries to avoid generating GOTO branches that cause
exponential path explosion in --paths mode. Three places are touched:
1. __CPROVER_deallocate (src/ansi-c/library/stdlib.c)
nondeterministic deallocation tracking
Before: if(__VERIFIER_nondet___CPROVER_bool())
__CPROVER_deallocated = ptr;
After: __CPROVER_deallocated =
nondet ? ptr : __CPROVER_deallocated;
2. free (src/ansi-c/library/stdlib.c)
memory-leak detection
Before: if(__CPROVER_memory_leak == ptr) __CPROVER_memory_leak = 0;
After: __CPROVER_memory_leak =
(__CPROVER_memory_leak == ptr) ? 0 : __CPROVER_memory_leak;
3. __delete and __delete_array (src/cpp/library/new.c)
same memory-leak detection idiom as in `free`, mirrored on the
C++ delete / delete[] paths so the optimisation applies symmetrically
to C++ programs.
Each rewrite is annotated with a short comment explaining why the
seemingly-redundant self-assignment in the false branch is intentional,
so that a future reader does not "simplify" it back to an `if` and
silently undo the optimisation.
Impact on Collections-C benchmark (159 tests, --paths lifo):
Original --paths: 61 pass, 93 timeout, 3105s
After fix 1: 123 pass, 31 timeout, 1463s
After both fixes: 127 pass, 26 timeout, 1159s
With precomp: 127 pass, 26 timeout, 1046s (14x vs Soteria)
The fixed --paths mode now passes 9 MORE tests than monolithic SAT
(127 vs 118), including treeset and treetable tests.
Adds a CORE-tagged regression test
`regression/cbmc/path-branch-malloc-free` with --paths lifo over a
handful of independent malloc/free pairs. With this fix the test runs
in around 18s; without it the per-profile ctest timeout (1200s) is
exceeded, so reverting the optimisation is caught.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
0dc0f37 to
78a609b
Compare
78a609b to
5672dc3
Compare
5672dc3 to
8f94bda
Compare
…alysis Local safe pointer analysis recognises dereferences guarded by a null-check in the control flow, e.g. `if(p != null) *p` and `assume(p != null); *p`. Extend it to also recognise a dereference in the guarded operand of a conditional expression, e.g. `(p != null) ? *p : x` (the true operand when the check holds on the taken branch, the false operand otherwise), reusing the existing null-check recognition. Such conditional expressions arise when guarded control flow is rewritten into conditional assignments (for instance by if-conversion). Without this, the analysis no longer sees the guard, so symex falls back on the `$object` last-resort target and emits spurious, imprecise dereference checks. The match is exact: the recorded dereference must be the one symex later queries. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
8f94bda to
19811d3
Compare
In --paths mode every GOTO branch forks the symbolic-execution path tree, so trivial conditional bookkeeping (such as the leak/deallocation tracking in the CPROVER library) causes exponential path explosion. This adds a goto-program transformation that linearises side-effect-free conditional-assignment regions, rewriting if(cond) x = e; into x = cond ? e : x; if(cond) x = a; else x = b; into x = cond ? a : b; and dropping the GOTO branch. It runs (only when --paths is given) before goto_check, so that goto_check_ct::check_rec_if keeps the per-operand safety checks of the generated conditional expressions guarded by the branch condition; the transformation is therefore sound even when the assignments dereference, divide or overflow. For an if/else, the else-region is linearised after the then-region. Emitting two independent self-referential conditional assignments (x = cond ? a : x; x = !cond ? b : x;) for an arm pair that targets the same variable would, after SSA, nest into x = !cond ? b : (cond ? a : x). That defeats constant propagation of x: a variable that was concrete on each original branch becomes a symbolic value, and in --paths mode that can drive downstream branches or loops into an explosion of paths (e.g. regression/cbmc/return4 went from a fraction of a second to minutes under --paths lifo, timing the suite out on slower runners). When both arms are a single assignment to the same variable we therefore emit one combined, simplified ternary x = cond ? a : b (so cond ? 1 : 1 folds back to the constant 1); more complex overlapping arms are left unconverted rather than chained. Arms that do not interfere keep the per-region form. The unit tests pin this behaviour. To preserve downstream precision the original branch condition is used directly in the generated conditional expressions, so analyses such as local safe pointer analysis still see guards like `p != null`. The condition is snapshotted into a fresh variable only when it would otherwise observe a clobbered value, i.e. when the region writes or kills a variable the condition reads (including a subsequently DEAD condition temporary). A region is only linearised when it consists solely of assignments and scope markers, has a single entry, and the branch itself is not a jump target. An assignment whose left-hand side contains a dereference or an array index is not linearised: that would move the (possibly faulting) access out of the guard, changing semantics and losing the null/bounds precision the guard provided. This generalises the per-site CPROVER-library edits of the preceding commit: with the pass enabled the same patterns are linearised automatically, in user code and elsewhere in the library alike, addressing the review request for a preprocessing pass rather than hand-editing the library. Tests that pin --paths output which legitimately changes under if-conversion (VCC counts, shown GOTO, coverage) are tagged paths-lifo-expected-failure. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
…rsion An earlier commit hand-rewrote four side-effect-free conditional assignments in the CPROVER library -- in free(), __CPROVER_deallocate(), __delete() and __delete_array() -- from `if`-statements into conditional expressions, so that --paths mode would not fork the symbolic-execution path tree on them. The if-conversion pass now performs exactly this linearisation automatically for any qualifying conditional assignment, so the manual edits are no longer necessary. Restore the original, more readable `if`-statement form; under --paths the pass rewrites these branches into guarded assignments just as the hand edits did (verified via the path-branch-malloc-free regression test, which is retained and now exercises the pass end-to-end). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
19811d3 to
4370b0f
Compare
Replace if-statements with conditional expressions in two places in the built-in CPROVER library to avoid generating GOTO branches that cause exponential path explosion in --paths mode:
__CPROVER_deallocate: nondeterministic deallocation tracking
Before: if(__VERIFIER_nondet___CPROVER_bool()) __CPROVER_deallocated = ptr;
After: __CPROVER_deallocated = nondet ? ptr : __CPROVER_deallocated;
free: memory leak detection
Before: if(__CPROVER_memory_leak==ptr) __CPROVER_memory_leak=0;
After: __CPROVER_memory_leak = (leak==ptr) ? 0 : leak;
Both are semantically equivalent but avoid GOTO instructions, so --paths mode does not fork at these points.
Impact on Collections-C benchmark (159 tests, --paths lifo):
Original --paths: 61 pass, 93 timeout, 3105s
After fix 1: 123 pass, 31 timeout, 1463s
After both fixes: 127 pass, 26 timeout, 1159s
With precomp: 127 pass, 26 timeout, 1046s (14x vs Soteria)
The fixed --paths mode now passes 9 MORE tests than monolithic SAT (127 vs 118), including treeset and treetable tests.