Skip to content

feat: fail-closed driver execution — audit, deadlines, pooled HTTP client#238

Open
dgenio wants to merge 3 commits into
mainfrom
claude/issue-triage-grouping-hjztiy
Open

feat: fail-closed driver execution — audit, deadlines, pooled HTTP client#238
dgenio wants to merge 3 commits into
mainfrom
claude/issue-triage-grouping-hjztiy

Conversation

@dgenio

@dgenio dgenio commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Closes #152, closes #191, closes #194, closes #197.

These four issues were selected as a coherent group: they share one code area (kernel/_invoke.py + drivers/http.py) and one fail-closed implementation path — any fault funnels through DriverError → fallback → failure trace → budget release, so the kernel's "controlled, audited execution" promise (invariant I-02) holds on every exit, not just the happy path.

What changed

#152 — Audit + budget release on any driver fault

  • execute_with_fallback now treats any exception a driver raises (not only DriverError) as a failed attempt, captured as last_error instead of escaping un-audited with the budget reservation leaked.
  • The post-driver pipeline in perform_invoke (handle creation, firewall transform, token counting) is wrapped in a fault-capturing try/except: a fault there releases the reservation exactly once and records a failure ActionTrace before re-raising.
  • Extracted execute_with_fallback into a new kernel/_driver_exec.py so _invoke.py stays within the documented 300-line module budget; its size-ratchet ceiling is lowered 400 → 382 (shrink-only respected).

#191 — Per-invocation deadline

  • Optional invoke_timeout_s token constraint bounds each driver attempt (single-shot and streaming, including a stream inactivity timeout) via asyncio.wait_for (3.10-compatible). A timeout becomes a synthetic DriverError, so the existing fallback + failure-trace paths apply unchanged.
  • Design note: the deadline is read from the signed token constraint rather than a Kernel.__init__ argument. This is the issue's explicitly-sanctioned "per-capability constraint" option, it is more secure (tamper-evident, bound to the grant), and it avoids growing kernel/__init__.py, which sits exactly at its shrink-only size-ratchet ceiling (541). A kernel-wide constructor default is a natural follow-up once Bring oversized modules back within the documented 300-line budget #188 splits that module.

#194 — Pooled HTTP client + response-size guard

  • HTTPDriver holds a single long-lived httpx.AsyncClient (connection pooling, configurable httpx.Limits) instead of building one per request, with an idempotent aclose() for shutdown. The http_driver_demo example now closes it in a finally.
  • Optional max_response_bytes streams the body and aborts oversized responses with a DriverError before they are fully buffered (the firewall budget only applies after a RawResult exists).

#197 — Defensive HTTP body parsing

  • A JSON endpoint returning a non-JSON body now raises a typed DriverError (content-type + redaction-safe snippet) instead of leaking json.JSONDecodeError.
  • New HTTPEndpoint.response_format ("json" | "text") supports text APIs deliberately.

Why

The audit found live I-02 gaps on the execution path: a non-DriverError exception, a firewall failure, or a hung driver could each leave an invocation un-traced and/or leak a budget reservation. The HTTP driver opened a fresh connection pool per call and could buffer an unbounded body or crash the agent loop on a non-JSON response. This PR closes all four conservatively (every behaviour defaults to its prior semantics; the deadline and size cap are opt-in).

How verified

Full repo gate (make cifmt-check → lint → type → test → example) in a clean venv — exit 0:

  • ruff format --check — 119 files already formatted
  • ruff check — All checks passed
  • mypy src/ (strict) — no issues in 62 source files
  • pytest789 passed, 1 skipped, branch coverage 93.91% (floor 90)
  • make example — all 13 examples complete, incl. http_driver_demo.py against a real local server

New/updated tests:

  • tests/test_driver_exec.py (new) — fallback, deadline, and non-DriverError fault capture
  • tests/test_drivers.py — rewritten HTTP tests for the pooled-client/streaming contract; new cases for non-JSON, text format, empty body, size-limit abort/allow, pooled reuse, aclose
  • tests/test_kernel.py — non-DriverError audited + budget released; firewall-failure audited + budget released; deadline aborts a slow driver; invalid invoke_timeout_s rejected

Tradeoffs / risks

  • Guarantee an audit trace and budget release on any driver exception #152 changes propagation semantics for non-DriverError exceptions (now treated as failed attempts and re-wrapped as DriverError after the route plan is exhausted). This is intentional and central to the fix, but it's a behaviour change worth a maintainer eye.
  • The HTTP driver now requires callers to own the client lifecycle (aclose()); the example demonstrates the pattern.

Scope notes

Limited to the four issues above. #170 (per-invocation rate-limit enforcement) was deliberately left out — it's a distinct rate-limiting/policy concern (policy.py/rate_limit.py) and would dilute this PR's focus; recommend it as a separate follow-up. A kernel-wide default deadline depends on the #188 module split.

🤖 Generated with Claude Code

https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD


Generated by Claude Code

…ient

Close four related gaps on the invocation path so the kernel's
"controlled, audited execution" promise (I-02) holds on every exit, not
just the happy path. Grouped because they share one code area
(kernel/_invoke.py + drivers/http.py) and one fail-closed implementation
path: any fault funnels through DriverError -> fallback -> failure trace
-> budget release.

- #152: capture *any* driver exception (not only DriverError) as a failed
  attempt, and wrap the post-driver pipeline (handle store, firewall
  transform, token counting) so a fault there records a failure trace and
  releases the reservation exactly once before re-raising. Extract
  execute_with_fallback into kernel/_driver_exec.py to keep _invoke.py
  within its module-size budget (lower its ratchet ceiling 400 -> 382).
- #191: optional per-invocation deadline via the signed `invoke_timeout_s`
  token constraint, enforced per driver attempt (single-shot and
  streaming, incl. a stream inactivity timeout) with asyncio.wait_for; a
  timeout becomes a synthetic DriverError. Sourced from the token (not a
  kernel ctor arg) so the deadline is tamper-evident and the kernel
  module stays within its size budget.
- #194: HTTPDriver holds one long-lived, pooled httpx.AsyncClient with
  configurable Limits and an aclose() lifecycle; optional
  max_response_bytes streams and aborts oversized bodies before buffering.
- #197: a non-JSON body from a JSON endpoint raises a typed DriverError
  (content-type + redaction-safe snippet); new HTTPEndpoint.response_format
  ("json"|"text") supports text APIs deliberately.

Tests: new tests/test_driver_exec.py (fallback, deadline, fault capture);
rewritten HTTP driver tests for the pooled-client/streaming contract;
kernel integration tests for non-DriverError audit, firewall-failure
audit, budget release, and deadline enforcement. Docs + CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
Copilot AI review requested due to automatic review settings June 25, 2026 05:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the Weaver Kernel’s invocation path to fail closed: ensuring driver faults (including unexpected exceptions and timeouts) are treated as auditable failed attempts, with budget reservations released and HTTP execution made more robust and efficient.

Changes:

  • Extracted driver execution/fallback + per-invocation deadline resolution into kernel/_driver_exec.py, and wired per-grant invoke_timeout_s into single-shot and streaming invocation.
  • Hardened perform_invoke to capture and trace post-driver faults (handle creation, firewall transform, budget usage accounting) and to avoid budget leaks on those failures.
  • Reworked HTTPDriver to reuse a pooled httpx.AsyncClient, add optional response size limits, and raise typed DriverError for malformed/non-JSON bodies; updated docs/examples/tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/weaver_kernel/kernel/_driver_exec.py New extracted driver-execution helper implementing fallback, unexpected-exception capture, and per-attempt timeouts.
src/weaver_kernel/kernel/_invoke.py Integrates fail-closed execution and per-invocation deadlines; adds post-driver fault capture to ensure audit + budget release.
src/weaver_kernel/kernel/_stream.py Applies per-grant timeout to streaming (inactivity) and non-streaming fallback in invoke_stream.
src/weaver_kernel/drivers/http.py Adds pooled HTTP client lifecycle (aclose), response-size bounding, and defensive JSON/text response parsing.
tests/test_driver_exec.py New focused tests for fallback behavior, timeout resolution, and unexpected-exception capture.
tests/test_kernel.py Adds end-to-end tests for audit + budget release on non-DriverError faults, firewall failures, and invocation timeouts.
tests/test_drivers.py Updates HTTP driver tests to validate pooled-client behavior, size limits, and body parsing semantics.
tests/test_architecture.py Updates the _invoke.py shrink-only size ratchet ceiling to reflect extraction.
examples/http_driver_demo.py Demonstrates closing the pooled HTTP client on shutdown.
docs/integrations.md Documents HTTP pooling/lifecycle, response formats, size limits, and the invoke_timeout_s constraint.
CHANGELOG.md Records the grouped fail-closed execution changes and HTTP driver improvements.

Comment on lines +146 to 151
if response.is_error:
await response.aread()
raise DriverError(
f"HTTPDriver '{self._driver_id}': HTTP {response.status_code} "
f"from {endpoint.url}: {response.text[:200]}"
)
Comment thread src/weaver_kernel/kernel/_invoke.py Outdated
principal_id=principal.principal_id,
token_id=token.token_id,
args=args,
response_mode=response_mode,
Comment thread src/weaver_kernel/kernel/_invoke.py Outdated
Comment on lines +318 to +321
record_failure_trace(
action_id=action_id,
capability_id=token.capability_id,
principal_id=principal.principal_id,
Comment thread src/weaver_kernel/kernel/_invoke.py Outdated
Comment on lines 259 to 262
downgraded = effective_mode != response_mode
raw_result, used_driver_id, last_error, fell_back = await execute_with_fallback(
kernel._driver_map, plan, ctx=ctx, log_ctx=log_ctx
kernel._driver_map, plan, ctx=ctx, log_ctx=log_ctx, timeout=invoke_timeout
)
Comment on lines +131 to +135
except asyncio.TimeoutError as exc:
raise DriverError(
f"Driver '{fallback_driver_id}' timed out after {invoke_timeout}s "
f"for capability '{token.capability_id}'."
) from exc
…ty, cancellation

Resolve the 5 review findings on the fail-closed driver-execution PR:

- http.py: the error path called `response.aread()` unconditionally,
  buffering an arbitrarily large error body and bypassing
  `max_response_bytes`. Read a bounded snippet (≤512 bytes) for the
  message instead, so the size guard holds on the failure path too (#194).
- _invoke.py: failure traces now record the *effective* response mode
  (matching success and streaming traces) instead of the caller-requested
  mode, and carry the implicated `driver_id` (the driver that ran and
  failed downstream, or the last attempted when all failed) — previously
  hard-coded to "". `record_failure_trace` gained an optional `driver_id`;
  `execute_with_fallback` now returns the last-attempted driver on failure.
- _invoke.py: task cancellation during execution no longer leaks the
  budget reservation or skips the audit trace. The reservation is released
  in a single `finally` (covering `CancelledError`, which is not an
  `Exception`) and a "invocation cancelled" failure trace is recorded
  before propagating — honouring the PR's "audited on every exit" goal.
- _stream.py: the stream trace records the real failure reason (e.g. a
  timeout `DriverError`), redacted, instead of always None / the generic
  "stream produced no chunks".

Tests: cancellation-mid-invoke (budget released + audited), stream-timeout
error-reason trace, bounded error-body read; updated existing assertions
for the new driver_id attribution. `make ci` green (792 passed, 94.01%).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD

dgenio commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

Addressed all five review findings in b8555bc:

  1. max_response_bytes bypassed on errors (http.py) — the error path no longer does an unbounded response.aread(); it streams a bounded snippet (≤512 bytes, sliced to 200 chars) for the message, so the size guard holds on the failure path too.
  2. Failure trace recorded caller mode, not effective (_invoke.py) — failure traces now record effective_mode, matching success and streaming traces.
  3. Failure trace hard-coded driver_id="" (_invoke.py) — record_failure_trace takes an optional driver_id; the post-driver path passes the driver that ran, and execute_with_fallback now returns the last-attempted driver so the all-failed path attributes it too.
  4. CancelledError leaked budget / skipped audit (_invoke.py) — the reservation is released in a single finally (which covers CancelledError, since it isn't an Exception) and a "invocation cancelled" failure trace is recorded before propagating. Added a test that cancels mid-invoke and asserts the reservation is freed and the cancel is audited.
  5. Stream trace dropped the real error reason (_stream.py) — the trace now records the actual failure message (e.g. a timeout DriverError), redacted, instead of always None / "stream produced no chunks".

make ci is green (792 passed, 94.01% branch coverage).


Generated by Claude Code

Comment thread tests/test_kernel.py Fixed
CodeQL flagged `await task` (awaiting a cancelled task to re-raise
CancelledError) as a statement with no effect — a false positive on a
bare await expression statement. Bind the result to `_` so the analyzer
sees an effectful assignment; behaviour is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants