perf(cache): static/volatile system split — cache the core across the whole session by QodeXcli · Pull Request #47 · QodeXcli/QodeX

QodeXcli · 2026-06-29T17:26:55Z

Why

Step 3 of the layered-context architecture (the "continue" after #44–#46). #44 caches the system block within a task, but across conversation turns the per-turn injections (memory, retrieval, dir-tree) change — so the whole system block cache-misses every turn, dragging the stable instruction core (~2k tokens of identity + base prompt, byte-identical every turn) down with it and re-billing it.

What

Split the system prompt at a static/volatile boundary so the byte-stable core gets its own cache breakpoint and stays a HIT for the whole session, not just within one task.

Message.cacheBoundary (transient, not persisted) — char offset where the stable core ends. Set in buildInitialMessages right after identity + base prompt + provider guidance, before the volatile injections.
Zero-leak by design — only the Anthropic provider reads cacheBoundary: convertMessages surfaces it, withCacheBreakpoints splits system into [core (cached) | volatile (uncached)]. Every other provider just reads content — no sentinel string in the prompt, no changes to them.
Still ≤4 breakpoints (core + last tool + rolling last message). No boundary / out-of-range ⇒ the previous single-block behavior (safe fallback).

Honest scope

This nails cross-turn caching of the core. It does not fully cache the conversation history across turns — volatile injections still sit before the message prefix in cache order, so the history re-caches when they change. Fully fixing that means moving volatile after the history (a behavior-affecting restructure) — deliberately deferred.

Tests

+3 (split marks only the core block; fallback when boundary absent/0/past-end; ≤4 breakpoints). Full suite 1339 green, tsc clean.

… whole session Step 3 of the layered-context architecture. #44 caches the system block within a task, but across conversation TURNS the per-turn injections (memory, retrieval, dir-tree) change, so the whole system block cache-misses every turn — taking the stable instruction core down with it. This splits the system prompt at a static/volatile boundary so the byte-stable core gets its own cache breakpoint and stays a HIT for the entire session: - Message gains a transient `cacheBoundary` (char offset where the stable core ends). Set in buildInitialMessages right after the identity + base prompt + provider guidance, BEFORE the volatile injections. Not persisted; rebuilt each run. - Zero-leak by design: ONLY the Anthropic provider reads cacheBoundary (convertMessages surfaces it; withCacheBreakpoints splits system into [core (cached) | volatile (uncached)]). Every other provider just reads `content` — no sentinel string, no changes to them. - Still ≤4 breakpoints (core + last tool + rolling last message). No boundary, or one out of range ⇒ the previous single-block behavior. +3 tests (split marks only the core; fallback when boundary absent/out-of-range; ≤4 breakpoints). Full suite 1339 green; tsc clean.

…turn hit) (#48) Follow-up to the static/volatile split (#47). #47 put the boundary right after the base prompt, so ALL injections — including session-STABLE ones (code-style profile, failure lessons) — landed in the uncached volatile tail and re-billed every turn. Now injections route into two buffers: - stableTail (code style, failure lessons) — byte-identical across turns → folded INTO the cached core, so they're a cache HIT for the whole session. - volatileTail (auto-retrieval, dep-graph, episodic recall) — genuinely query-dependent → stays after the boundary, uncached. The cache boundary now lands between them. Pure content regrouping (no message bloat — volatile stays in the regenerated system prompt, never persisted to history) — guidance simply precedes per-turn context now. Also helps LOCAL backends: a larger byte-stable prefix means Ollama/ llama.cpp KV prefix-cache hits more across turns (local "turbo cache"). Full suite 1339 green; tsc clean. The failure-lessons taskKey side effect is preserved. Co-authored-by: Louise Lau <QodeXcli@users.noreply.github.com>

QodeXcli merged commit 6a5c4f0 into main Jun 29, 2026
2 checks passed

QodeXcli deleted the feat/static-volatile-split branch June 29, 2026 17:27

QodeXcli mentioned this pull request Jun 30, 2026

perf(cache): fold stable guidance into the cached core #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cache): static/volatile system split — cache the core across the whole session#47

perf(cache): static/volatile system split — cache the core across the whole session#47
QodeXcli merged 1 commit into
mainfrom
feat/static-volatile-split

QodeXcli commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

QodeXcli commented Jun 29, 2026

Why

What

Honest scope

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant