Skip to content

Avoid branches in CPROVER library for --paths mode#8971

Open
tautschnig wants to merge 4 commits into
diffblue:developfrom
tautschnig:library-branching
Open

Avoid branches in CPROVER library for --paths mode#8971
tautschnig wants to merge 4 commits into
diffblue:developfrom
tautschnig:library-branching

Conversation

@tautschnig

Copy link
Copy Markdown
Collaborator

Replace if-statements with conditional expressions in two places in the built-in CPROVER library to avoid generating GOTO branches that cause exponential path explosion in --paths mode:

  1. __CPROVER_deallocate: nondeterministic deallocation tracking
    Before: if(__VERIFIER_nondet___CPROVER_bool()) __CPROVER_deallocated = ptr;
    After: __CPROVER_deallocated = nondet ? ptr : __CPROVER_deallocated;

  2. free: memory leak detection
    Before: if(__CPROVER_memory_leak==ptr) __CPROVER_memory_leak=0;
    After: __CPROVER_memory_leak = (leak==ptr) ? 0 : leak;

Both are semantically equivalent but avoid GOTO instructions, so --paths mode does not fork at these points.

Impact on Collections-C benchmark (159 tests, --paths lifo):
Original --paths: 61 pass, 93 timeout, 3105s
After fix 1: 123 pass, 31 timeout, 1463s
After both fixes: 127 pass, 26 timeout, 1159s
With precomp: 127 pass, 26 timeout, 1046s (14x vs Soteria)

The fixed --paths mode now passes 9 MORE tests than monolithic SAT (127 vs 118), including treeset and treetable tests.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Apr 27, 2026
Copilot AI review requested due to automatic review settings April 27, 2026 10:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the built-in ANSI-C CPROVER library (stdlib.c) to avoid generating GOTO-level control-flow branches in --paths mode by replacing two if statements with conditional (?:) expressions. This targets path explosion by keeping the logic as expression-level conditionals instead of control-flow splits.

Changes:

  • Rewrite free’s memory-leak tracking (__CPROVER_memory_leak) update using a conditional expression.
  • Rewrite __CPROVER_deallocate’s nondeterministic deallocation tracking (__CPROVER_deallocated) update using a conditional expression.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov

codecov Bot commented Apr 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.80952% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.71%. Comparing base (0139d01) to head (4370b0f).

Files with missing lines Patch % Lines
src/goto-programs/if_conversion.cpp 98.55% 3 Missing ⚠️
src/cbmc/cbmc_parse_options.cpp 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8971      +/-   ##
===========================================
+ Coverage    80.68%   80.71%   +0.03%     
===========================================
  Files         1713     1715       +2     
  Lines       189499   189834     +335     
  Branches        73       73              
===========================================
+ Hits        152890   153232     +342     
+ Misses       36609    36602       -7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig force-pushed the library-branching branch from 8c2786d to 0dc0f37 Compare May 26, 2026 07:52
@tautschnig tautschnig assigned kroening and unassigned tautschnig May 26, 2026
@kroening

Copy link
Copy Markdown
Collaborator

I am ok merging this, but we shouldn't need it.

@kroening

Copy link
Copy Markdown
Collaborator

If we don't want to consider path merging, then an option could be to add a preprocessing pass.

@kroening kroening assigned tautschnig and unassigned kroening Jun 14, 2026
Replace if-statements with conditional expressions in the built-in
CPROVER libraries to avoid generating GOTO branches that cause
exponential path explosion in --paths mode. Three places are touched:

1. __CPROVER_deallocate (src/ansi-c/library/stdlib.c)
   nondeterministic deallocation tracking
   Before: if(__VERIFIER_nondet___CPROVER_bool())
             __CPROVER_deallocated = ptr;
   After:  __CPROVER_deallocated =
             nondet ? ptr : __CPROVER_deallocated;

2. free (src/ansi-c/library/stdlib.c)
   memory-leak detection
   Before: if(__CPROVER_memory_leak == ptr) __CPROVER_memory_leak = 0;
   After:  __CPROVER_memory_leak =
             (__CPROVER_memory_leak == ptr) ? 0 : __CPROVER_memory_leak;

3. __delete and __delete_array (src/cpp/library/new.c)
   same memory-leak detection idiom as in `free`, mirrored on the
   C++ delete / delete[] paths so the optimisation applies symmetrically
   to C++ programs.

Each rewrite is annotated with a short comment explaining why the
seemingly-redundant self-assignment in the false branch is intentional,
so that a future reader does not "simplify" it back to an `if` and
silently undo the optimisation.

Impact on Collections-C benchmark (159 tests, --paths lifo):
  Original --paths:  61 pass, 93 timeout, 3105s
  After fix 1:      123 pass, 31 timeout, 1463s
  After both fixes: 127 pass, 26 timeout, 1159s
  With precomp:     127 pass, 26 timeout, 1046s (14x vs Soteria)

The fixed --paths mode now passes 9 MORE tests than monolithic SAT
(127 vs 118), including treeset and treetable tests.

Adds a CORE-tagged regression test
`regression/cbmc/path-branch-malloc-free` with --paths lifo over a
handful of independent malloc/free pairs. With this fix the test runs
in around 18s; without it the per-profile ctest timeout (1200s) is
exceeded, so reverting the optimisation is caught.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig

tautschnig commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

If we don't want to consider path merging, then an option could be to add a preprocessing pass.

d8e1ef3 now implements that preprocessing pass, and 19811d3 then reverts the selective workarounds in our library model.

@tautschnig tautschnig requested a review from a team as a code owner June 14, 2026 21:00
@tautschnig tautschnig requested a review from martin-cs as a code owner June 15, 2026 06:27
…alysis

Local safe pointer analysis recognises dereferences guarded by a null-check in
the control flow, e.g. `if(p != null) *p` and `assume(p != null); *p`. Extend
it to also recognise a dereference in the guarded operand of a conditional
expression, e.g. `(p != null) ? *p : x` (the true operand when the check holds
on the taken branch, the false operand otherwise), reusing the existing
null-check recognition.

Such conditional expressions arise when guarded control flow is rewritten into
conditional assignments (for instance by if-conversion). Without this, the
analysis no longer sees the guard, so symex falls back on the `$object`
last-resort target and emits spurious, imprecise dereference checks. The match
is exact: the recorded dereference must be the one symex later queries.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
tautschnig and others added 2 commits June 15, 2026 19:07
In --paths mode every GOTO branch forks the symbolic-execution path tree, so
trivial conditional bookkeeping (such as the leak/deallocation tracking in the
CPROVER library) causes exponential path explosion. This adds a goto-program
transformation that linearises side-effect-free conditional-assignment regions,
rewriting

  if(cond) x = e;             into  x = cond ? e : x;
  if(cond) x = a; else x = b; into  x = cond ? a : b;

and dropping the GOTO branch. It runs (only when --paths is given) before
goto_check, so that goto_check_ct::check_rec_if keeps the per-operand safety
checks of the generated conditional expressions guarded by the branch
condition; the transformation is therefore sound even when the assignments
dereference, divide or overflow.

For an if/else, the else-region is linearised after the then-region. Emitting
two independent self-referential conditional assignments (x = cond ? a : x;
x = !cond ? b : x;) for an arm pair that targets the same variable would, after
SSA, nest into x = !cond ? b : (cond ? a : x). That defeats constant
propagation of x: a variable that was concrete on each original branch becomes
a symbolic value, and in --paths mode that can drive downstream branches or
loops into an explosion of paths (e.g. regression/cbmc/return4 went from a
fraction of a second to minutes under --paths lifo, timing the suite out on
slower runners). When both arms are a single assignment to the same variable we
therefore emit one combined, simplified ternary x = cond ? a : b (so cond ? 1 :
1 folds back to the constant 1); more complex overlapping arms are left
unconverted rather than chained. Arms that do not interfere keep the per-region
form. The unit tests pin this behaviour.

To preserve downstream precision the original branch condition is used directly
in the generated conditional expressions, so analyses such as local safe
pointer analysis still see guards like `p != null`. The condition is
snapshotted into a fresh variable only when it would otherwise observe a
clobbered value, i.e. when the region writes or kills a variable the condition
reads (including a subsequently DEAD condition temporary).

A region is only linearised when it consists solely of assignments and scope
markers, has a single entry, and the branch itself is not a jump target. An
assignment whose left-hand side contains a dereference or an array index is not
linearised: that would move the (possibly faulting) access out of the guard,
changing semantics and losing the null/bounds precision the guard provided.

This generalises the per-site CPROVER-library edits of the preceding commit:
with the pass enabled the same patterns are linearised automatically, in user
code and elsewhere in the library alike, addressing the review request for a
preprocessing pass rather than hand-editing the library.

Tests that pin --paths output which legitimately changes under if-conversion
(VCC counts, shown GOTO, coverage) are tagged paths-lifo-expected-failure.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
…rsion

An earlier commit hand-rewrote four side-effect-free conditional assignments
in the CPROVER library -- in free(), __CPROVER_deallocate(), __delete() and
__delete_array() -- from `if`-statements into conditional expressions, so that
--paths mode would not fork the symbolic-execution path tree on them.

The if-conversion pass now performs exactly this linearisation automatically
for any qualifying conditional assignment, so the manual edits are no longer
necessary. Restore the original, more readable `if`-statement form; under
--paths the pass rewrites these branches into guarded assignments just as the
hand edits did (verified via the path-branch-malloc-free regression test,
which is retained and now exercises the pass end-to-end).

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants