Skip to content

Add skill-evolver: closed-loop self-improvement system for skills#439

Draft
wzhipan wants to merge 10 commits into
masterfrom
zhipan/skill-evolver
Draft

Add skill-evolver: closed-loop self-improvement system for skills#439
wzhipan wants to merge 10 commits into
masterfrom
zhipan/skill-evolver

Conversation

@wzhipan

@wzhipan wzhipan commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds skill-evolver — a closed-loop self-improvement system for skills and AI tools — and integrates it with the existing skill lifecycle. Built and refined entirely via its own retrospective loop this session.

The loop: capture friction -> analyze -> propose reviewed edits -> validate -> log -> measure.

What's included

New system

  • .github/hooks/journal-utils.js — single-writer JSONL friction journal store + CLI (record, stats, skill-sizes, list, attribution markers). Store lives in ~/.skill-evolution/ (gitignored).
  • .github/skills/skill-evolver/SKILL.md + references (friction-schema, classification-rubric, edit-safety-rules, bloat-control).
  • .github/skill-evolution/evolution-log.md — auditable changelog of every applied change with rollback refs.

Runtime-honest capture

  • friction-capture.js is marked dormant (Claude Code-style hook; the GitHub Copilot CLI runtime has no hooks, so it never fires here). Active, agent-driven capture is the primary mechanism on this runtime; the hook registrations were removed from orchestrator.json.

Anti-bloat guardrails

  • skill-sizes tripwire flags any SKILL.md over body/description budget; prune phase + consolidate-over-append + references-over-body rules counter the loop's natural addition bias.

Lifecycle integration with skill-creator

  • creator -> evolver: Step 6 points to skill-evolver for continuous iteration.
  • evolver -> creator: new Needs a new skill classification outcome (novel out-of-scope task, or splitting an over-budget skill).

Registration

  • copilot-instructions.md skills table updated.

Validation

  • skill-evolver and skill-creator pass quick_validate.py.
  • skill-sizes reports all skills within budget.
  • Three retrospectives run against real friction; all skill defects found were fixed, journal converged to steady state.

Notes

  • Draft for team review.
  • Live friction journals are not committed (gitignored) — only curated artifacts (evolution-log) are tracked.

wzhipan and others added 10 commits June 16, 2026 13:03
Introduces a closed feedback loop (capture → analyze → propose → review →
apply → validate → measure) for evolving skills and AI tools.

- .github/hooks/journal-utils.js: single-writer JSONL friction journal store
  CLI + module (record, set-active, stats, list) under ~/.skill-evolution/
- .github/hooks/friction-capture.js: PostToolUse/Stop hook that auto-logs
  tool failures, attributes them to the active skill, clears attribution on Stop
- .github/hooks/orchestrator.json: register PostToolUse + Stop capture hooks
- .github/skills/skill-evolver/: SKILL.md + friction-schema, classification-rubric,
  edit-safety-rules references
- .github/skill-evolution/: evolution-log changelog + .gitignore for local journal
- .github/copilot-instructions.md: register skill-evolver in the skills table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expand the description frontmatter (the skill's activation mechanism) with
more natural-language trigger phrases and a clearer proactive cue, so the
skill self-activates without the user naming it explicitly. Also broadens
scope wording to skills, prompts, and AI tools, and syncs the skills-table
row in copilot-instructions.md. Validated at 1019/1024 chars.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address the concern that always-on capture could feel intrusive:

- SKILL_EVOLUTION_DISABLE env var silences all capture (hook + journal
  recordEvent become no-ops); read paths (stats/list) still work so past
  data stays reviewable.
- friction-capture.js exits early when disabled, still returning
  {continue:true} so the tool flow is never blocked.
- journal-utils.js recordEvent no-ops when disabled; CLI `record` reports
  it cleanly instead of printing null.
- SKILL.md: add a "Non-intrusiveness & controls" section documenting the
  silent/non-blocking capture, the no-mid-task-edits guarantee, the off
  switch, and an explicit rule that proactive logging must be one-line and
  must never interrupt or question the user mid-task.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Retrospective over the friction journal (3 active-captured events from the
build session). Each fix individually approved by the developer.

1. skill-creator: document the PyYAML prerequisite for the validation/
   packaging scripts (fixes ModuleNotFoundError: No module named 'yaml').
2. skill-evolver: clarify that automatic hook capture is best-effort and
   active capture is the PRIMARY path (this runtime didn't fire PostToolUse).
3. skill-evolver: state the 1024-char description limit explicitly and add a
   length-check command in edit-safety-rules (cost 2 retries this session).

Logged all three in evolution-log.md. Both edited skills pass quick_validate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… hook

The GitHub Copilot CLI runtime has no hooks system, so the Claude Code-style
PostToolUse/Stop registrations in orchestrator.json never fired. Per the
developer's choice (Option A: Copilot CLI only), stop pretending capture is
automatic and make active (agent-driven) capture the primary mechanism.

- orchestrator.json: remove the inert friction-capture.js registrations
  (PostToolUse, Stop, and the duplicate SubagentStop entry); keep the
  orchestrator's own subagent hooks.
- friction-capture.js: mark DORMANT with a header banner explaining it is
  Claude Code-only and how to enable it via .claude/settings.json.
- skill-evolver/SKILL.md: reframe Architecture + Capture so active capture is
  the primary/only reliable path on this runtime; fix non-intrusiveness and
  off-switch wording that implied a background hook runs here.
- evolution-log.md: record the change with rollback ref.

Validated: quick_validate passes; orchestrator.json no longer references
friction-capture; CLI record/stats still work (active capture intact).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
edit-safety-rules.md Workflow step 1 was ambiguous about which tool to use
for branch creation. I used gitkraken-git_checkout which doesn't support -b,
requiring an unnecessary two-step workaround. Verified: git checkout -b works
correctly via the powershell tool with native git 2.52.0.

Clarified in one line: 'via the powershell tool (not gitkraken-git_checkout,
which does not support -b)'.

Retro #2 summary: 7 journal events (4 carried/confirmed-fixed, 3 new).
1 skill defect fixed; 2 environmental (no action). All 4 prior fixes hold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
User feedback: retrospective proposals showed diffs but didn't make clear
which skill each fix targeted. Since skill-evolver evolves many skills, that
ambiguity makes per-skill review decisions hard.

SKILL.md section 3 now mandates:
- a per-proposal header: 'Target: <skill> -> <file> . <root-cause> . <severity>'
- a summary table (# . Target skill . File . Root cause . Severity) when
  proposing multiple fixes
- naming the target skill in per-fix approval questions

quick_validate passes; logged in evolution-log.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ate + references)

Counters the loop's addition bias so skills don't grow into caveat-soup:

- #2 tripwire: new 'journal-utils.js skill-sizes' command scans every
  SKILL.md and flags body >400/500 lines and description >900/1024 chars.
- #1 prune: SKILL.md section 4 is now 'Measure & prune' - run skill-sizes
  each retro; every ~5th retro (or when flagged) propose removals, not just
  additions.
- #3 + #4: new edit-safety rule 6 (consolidate over append; references over
  body; don't add to an over-budget skill without pruning).
- New references/bloat-control.md holds budgets + prune procedure, kept out
  of the always-loaded body (practicing #4).

Validated: quick_validate passes; skill-sizes runs and already flags
skill-evolver's own description (1019/1024, DESC_WARN). Body grew only
104->111 lines because detail went into the reference.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The skill-sizes tripwire flagged skill-evolver's own description at
1019/1024 chars. Removed redundant trigger phrasings (overlapping 'didn't
work' wording, a duplicate example) and tightened the global-lessons clause;
strongest triggers preserved.

Now 887 chars (under the 900 warn). quick_validate passes and skill-sizes
reports all skills within budget. Demonstrates the anti-bloat loop end to
end: tripwire flagged, prune cleared.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… outcome

Integrate the build-time (skill-creator) and run-time (skill-evolver) halves
of the skill lifecycle via lightweight cross-references (not a merge), and
close a real gap: the evolver had no path to recommend creating a NEW skill.

- creator -> evolver: Step 6 'Iterate' now points to skill-evolver for
  continuous, evidence-based iteration after a skill is in use (Step 6 still
  covers immediate in-authoring tweaks).
- evolver -> creator: new 'Needs a new skill' classification outcome for a
  substantial out-of-scope task, or splitting an over-budget skill that's
  doing two jobs -> hand off to skill-creator. Added to the rubric table,
  SKILL.md target-decision list, and bloat-control prune procedure.

Kept separate by design (distinct triggers, freedom levels, 1024-char
description ceiling). Both skills pass quick_validate; skill-sizes reports
all within budget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown

❌ Work item link check failed. Description does not contain AB#{ID}.

Click here to Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant