feat(recall): receipt-aware ranking + timeline — ground-truth-weighted recall#102
Merged
Conversation
…d recall recall_approach now knows which past approaches PROVABLY worked, ranks with that, and shows the evolution on a timeline. Same pipeline everywhere (CLI tool + dashboard panel via the shared renderer). - Episode.verified: stamped true at record time (episodes are written on the objective-success path — sandbox compiled + merged, verify/completion gates passed). Old episodes → unknown. - New recall source kind 'receipt': maintain runs with receipts join the corpus — verified autonomous work IS history worth recalling. opened → verified:true, blocked/failed → false. - rankApproaches verifiedBoost: ✓ proven +0.08, ⛔ blocked/failed −0.04, unknown neutral — applied to the effective rank only; the displayed score stays honest raw relevance. Composes with the existing recency tilt + MMR diversity + stemming. - Visualization: outcome marks on every label (🎯 task ✓ / 🧾 receipt ⛔) + renderTimeline — the last 5 dated approaches, oldest → newest with ○/● glyphs and per-entry outcome marks; omitted when <2 matches carry a date. Appended by renderApproachDiffs, so the dashboard Recall panel and the CLI tool get it with zero extra wiring. - Both gathers (recall_approach tool + dashboard recall.query) feed episodes' verified flag and the maintain-receipt source.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Advanced recall, exactly per the plan: search → receipt-aware ranking → visualization (diff + timeline) → dashboard/CLI.
Receipt-aware ranking (the piece recall never had)
Episode.verified— stampedtrueat record time: episodes are only written on the objective-success path (sandbox compiled + merged, verify/completion gates passed). Old episodes stay unknown-neutral.receipt— verified autonomous work is history worth recalling.opened→ verified ✓,blocked/failed→ ⛔.verifiedBoostinrankApproaches: ✓ +0.08, ⛔ −0.04, unknown 0 — applied to the effective rank only; the displayed score stays honest raw relevance. Composes with recency tilt + MMR diversity + stemming.Timeline + outcome marks
Every label now carries its ground-truth outcome (
🎯 task ✓,🧾 receipt ⛔), and the output ends with the evolution story:Last 5 dated approaches (per spec); omitted when <2 carry dates. It's appended inside
renderApproachDiffs, so the dashboard Recall panel and the CLI tool inherit it with zero extra wiring.Live demo output shows the point: a verified jwt approach at 41% raw relevance correctly outranks an unverified 45% one — and a failed oauth attempt surfaces with ⛔ as a warning, which recall could never say before.
+5 tests (✓>unknown>⛔ ordering with honest displayed score; timeline order/cap/marks/undated-skip/omission; receipt tag). Full suite 1535 green, tsc clean.