feat: fail-closed driver execution — audit, deadlines, pooled HTTP client#238
Open
dgenio wants to merge 3 commits into
Open
feat: fail-closed driver execution — audit, deadlines, pooled HTTP client#238dgenio wants to merge 3 commits into
dgenio wants to merge 3 commits into
Conversation
…ient Close four related gaps on the invocation path so the kernel's "controlled, audited execution" promise (I-02) holds on every exit, not just the happy path. Grouped because they share one code area (kernel/_invoke.py + drivers/http.py) and one fail-closed implementation path: any fault funnels through DriverError -> fallback -> failure trace -> budget release. - #152: capture *any* driver exception (not only DriverError) as a failed attempt, and wrap the post-driver pipeline (handle store, firewall transform, token counting) so a fault there records a failure trace and releases the reservation exactly once before re-raising. Extract execute_with_fallback into kernel/_driver_exec.py to keep _invoke.py within its module-size budget (lower its ratchet ceiling 400 -> 382). - #191: optional per-invocation deadline via the signed `invoke_timeout_s` token constraint, enforced per driver attempt (single-shot and streaming, incl. a stream inactivity timeout) with asyncio.wait_for; a timeout becomes a synthetic DriverError. Sourced from the token (not a kernel ctor arg) so the deadline is tamper-evident and the kernel module stays within its size budget. - #194: HTTPDriver holds one long-lived, pooled httpx.AsyncClient with configurable Limits and an aclose() lifecycle; optional max_response_bytes streams and aborts oversized bodies before buffering. - #197: a non-JSON body from a JSON endpoint raises a typed DriverError (content-type + redaction-safe snippet); new HTTPEndpoint.response_format ("json"|"text") supports text APIs deliberately. Tests: new tests/test_driver_exec.py (fallback, deadline, fault capture); rewritten HTTP driver tests for the pooled-client/streaming contract; kernel integration tests for non-DriverError audit, firewall-failure audit, budget release, and deadline enforcement. Docs + CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
There was a problem hiding this comment.
Pull request overview
This PR tightens the Weaver Kernel’s invocation path to fail closed: ensuring driver faults (including unexpected exceptions and timeouts) are treated as auditable failed attempts, with budget reservations released and HTTP execution made more robust and efficient.
Changes:
- Extracted driver execution/fallback + per-invocation deadline resolution into
kernel/_driver_exec.py, and wired per-grantinvoke_timeout_sinto single-shot and streaming invocation. - Hardened
perform_invoketo capture and trace post-driver faults (handle creation, firewall transform, budget usage accounting) and to avoid budget leaks on those failures. - Reworked
HTTPDriverto reuse a pooledhttpx.AsyncClient, add optional response size limits, and raise typedDriverErrorfor malformed/non-JSON bodies; updated docs/examples/tests accordingly.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/weaver_kernel/kernel/_driver_exec.py |
New extracted driver-execution helper implementing fallback, unexpected-exception capture, and per-attempt timeouts. |
src/weaver_kernel/kernel/_invoke.py |
Integrates fail-closed execution and per-invocation deadlines; adds post-driver fault capture to ensure audit + budget release. |
src/weaver_kernel/kernel/_stream.py |
Applies per-grant timeout to streaming (inactivity) and non-streaming fallback in invoke_stream. |
src/weaver_kernel/drivers/http.py |
Adds pooled HTTP client lifecycle (aclose), response-size bounding, and defensive JSON/text response parsing. |
tests/test_driver_exec.py |
New focused tests for fallback behavior, timeout resolution, and unexpected-exception capture. |
tests/test_kernel.py |
Adds end-to-end tests for audit + budget release on non-DriverError faults, firewall failures, and invocation timeouts. |
tests/test_drivers.py |
Updates HTTP driver tests to validate pooled-client behavior, size limits, and body parsing semantics. |
tests/test_architecture.py |
Updates the _invoke.py shrink-only size ratchet ceiling to reflect extraction. |
examples/http_driver_demo.py |
Demonstrates closing the pooled HTTP client on shutdown. |
docs/integrations.md |
Documents HTTP pooling/lifecycle, response formats, size limits, and the invoke_timeout_s constraint. |
CHANGELOG.md |
Records the grouped fail-closed execution changes and HTTP driver improvements. |
Comment on lines
+146
to
151
| if response.is_error: | ||
| await response.aread() | ||
| raise DriverError( | ||
| f"HTTPDriver '{self._driver_id}': HTTP {response.status_code} " | ||
| f"from {endpoint.url}: {response.text[:200]}" | ||
| ) |
| principal_id=principal.principal_id, | ||
| token_id=token.token_id, | ||
| args=args, | ||
| response_mode=response_mode, |
Comment on lines
+318
to
+321
| record_failure_trace( | ||
| action_id=action_id, | ||
| capability_id=token.capability_id, | ||
| principal_id=principal.principal_id, |
Comment on lines
259
to
262
| downgraded = effective_mode != response_mode | ||
| raw_result, used_driver_id, last_error, fell_back = await execute_with_fallback( | ||
| kernel._driver_map, plan, ctx=ctx, log_ctx=log_ctx | ||
| kernel._driver_map, plan, ctx=ctx, log_ctx=log_ctx, timeout=invoke_timeout | ||
| ) |
Comment on lines
+131
to
+135
| except asyncio.TimeoutError as exc: | ||
| raise DriverError( | ||
| f"Driver '{fallback_driver_id}' timed out after {invoke_timeout}s " | ||
| f"for capability '{token.capability_id}'." | ||
| ) from exc |
…ty, cancellation Resolve the 5 review findings on the fail-closed driver-execution PR: - http.py: the error path called `response.aread()` unconditionally, buffering an arbitrarily large error body and bypassing `max_response_bytes`. Read a bounded snippet (≤512 bytes) for the message instead, so the size guard holds on the failure path too (#194). - _invoke.py: failure traces now record the *effective* response mode (matching success and streaming traces) instead of the caller-requested mode, and carry the implicated `driver_id` (the driver that ran and failed downstream, or the last attempted when all failed) — previously hard-coded to "". `record_failure_trace` gained an optional `driver_id`; `execute_with_fallback` now returns the last-attempted driver on failure. - _invoke.py: task cancellation during execution no longer leaks the budget reservation or skips the audit trace. The reservation is released in a single `finally` (covering `CancelledError`, which is not an `Exception`) and a "invocation cancelled" failure trace is recorded before propagating — honouring the PR's "audited on every exit" goal. - _stream.py: the stream trace records the real failure reason (e.g. a timeout `DriverError`), redacted, instead of always None / the generic "stream produced no chunks". Tests: cancellation-mid-invoke (budget released + audited), stream-timeout error-reason trace, bounded error-body read; updated existing assertions for the new driver_id attribution. `make ci` green (792 passed, 94.01%). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
Owner
Author
|
Addressed all five review findings in b8555bc:
Generated by Claude Code |
CodeQL flagged `await task` (awaiting a cancelled task to re-raise CancelledError) as a statement with no effect — a false positive on a bare await expression statement. Bind the result to `_` so the analyzer sees an effectful assignment; behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #152, closes #191, closes #194, closes #197.
These four issues were selected as a coherent group: they share one code area (
kernel/_invoke.py+drivers/http.py) and one fail-closed implementation path — any fault funnels throughDriverError → fallback → failure trace → budget release, so the kernel's "controlled, audited execution" promise (invariant I-02) holds on every exit, not just the happy path.What changed
#152 — Audit + budget release on any driver fault
execute_with_fallbacknow treats any exception a driver raises (not onlyDriverError) as a failed attempt, captured aslast_errorinstead of escaping un-audited with the budget reservation leaked.perform_invoke(handle creation, firewall transform, token counting) is wrapped in a fault-capturingtry/except: a fault there releases the reservation exactly once and records a failureActionTracebefore re-raising.execute_with_fallbackinto a newkernel/_driver_exec.pyso_invoke.pystays within the documented 300-line module budget; its size-ratchet ceiling is lowered400 → 382(shrink-only respected).#191 — Per-invocation deadline
invoke_timeout_stoken constraint bounds each driver attempt (single-shot and streaming, including a stream inactivity timeout) viaasyncio.wait_for(3.10-compatible). A timeout becomes a syntheticDriverError, so the existing fallback + failure-trace paths apply unchanged.Kernel.__init__argument. This is the issue's explicitly-sanctioned "per-capability constraint" option, it is more secure (tamper-evident, bound to the grant), and it avoids growingkernel/__init__.py, which sits exactly at its shrink-only size-ratchet ceiling (541). A kernel-wide constructor default is a natural follow-up once Bring oversized modules back within the documented 300-line budget #188 splits that module.#194 — Pooled HTTP client + response-size guard
HTTPDriverholds a single long-livedhttpx.AsyncClient(connection pooling, configurablehttpx.Limits) instead of building one per request, with an idempotentaclose()for shutdown. Thehttp_driver_demoexample now closes it in afinally.max_response_bytesstreams the body and aborts oversized responses with aDriverErrorbefore they are fully buffered (the firewall budget only applies after aRawResultexists).#197 — Defensive HTTP body parsing
DriverError(content-type + redaction-safe snippet) instead of leakingjson.JSONDecodeError.HTTPEndpoint.response_format("json"|"text") supports text APIs deliberately.Why
The audit found live I-02 gaps on the execution path: a non-
DriverErrorexception, a firewall failure, or a hung driver could each leave an invocation un-traced and/or leak a budget reservation. The HTTP driver opened a fresh connection pool per call and could buffer an unbounded body or crash the agent loop on a non-JSON response. This PR closes all four conservatively (every behaviour defaults to its prior semantics; the deadline and size cap are opt-in).How verified
Full repo gate (
make ci→fmt-check → lint → type → test → example) in a clean venv — exit 0:ruff format --check— 119 files already formattedruff check— All checks passedmypy src/(strict) — no issues in 62 source filespytest— 789 passed, 1 skipped, branch coverage 93.91% (floor 90)make example— all 13 examples complete, incl.http_driver_demo.pyagainst a real local serverNew/updated tests:
tests/test_driver_exec.py(new) — fallback, deadline, and non-DriverErrorfault capturetests/test_drivers.py— rewritten HTTP tests for the pooled-client/streaming contract; new cases for non-JSON,textformat, empty body, size-limit abort/allow, pooled reuse,aclosetests/test_kernel.py— non-DriverErroraudited + budget released; firewall-failure audited + budget released; deadline aborts a slow driver; invalidinvoke_timeout_srejectedTradeoffs / risks
DriverErrorexceptions (now treated as failed attempts and re-wrapped asDriverErrorafter the route plan is exhausted). This is intentional and central to the fix, but it's a behaviour change worth a maintainer eye.aclose()); the example demonstrates the pattern.Scope notes
Limited to the four issues above. #170 (per-invocation rate-limit enforcement) was deliberately left out — it's a distinct rate-limiting/policy concern (
policy.py/rate_limit.py) and would dilute this PR's focus; recommend it as a separate follow-up. A kernel-wide default deadline depends on the #188 module split.🤖 Generated with Claude Code
https://claude.ai/code/session_018VdP3irZSbbPyyoVS1QQbD
Generated by Claude Code