Skip to content

feat: consolidate tracer, fees & CRM into the midaz v4 monorepo#2159

Open
fredcamaral wants to merge 428 commits into
developfrom
feat/monorepo-consolidation
Open

feat: consolidate tracer, fees & CRM into the midaz v4 monorepo#2159
fredcamaral wants to merge 428 commits into
developfrom
feat/monorepo-consolidation

Conversation

@fredcamaral

Copy link
Copy Markdown
Member

What

Consolidates tracer, plugin-fees, and CRM into the midaz v4 monorepo — unified components/ledger binary (onboarding + transaction + CRM + fees on :3002) plus co-located components/tracer (:4020). Single root go.mod (github.com/LerianStudio/midaz/v4), no go.work.

Supersedes the closed #2156 / #2154. The load-bearing difference from those PRs:

⭐ The reporter is NO LONGER part of this consolidation

Earlier attempts folded the reporter in too. That decision was reversed — the reporter is a separately-sellable product coupled to the ledger only over the wire (RabbitMQ + API), not in-process, so it belongs in its own repo. It has been extracted back to LerianStudio/reporter (PR #696) and removed from this monorepo (626 files).

A direct consequence: the private fetcher/pkg/engine dependency is gone from midaz's go.mod — restoring a clean source-available build for external clones (the ledger core never imported it, but a single root module made it a build-time dep for everyone).

Scope

  • Fees embedded in ledger: engine pkg/fee, shared types pkg/feeshared, use cases internal/services/fees; applied at the transaction_create.go fee seam.
  • CRM folded into components/ledger/internal/crm (package tree, no cmd//internal/); routes register under the midaz authz namespace — the tenant-manager policy migration is the X1 release gate (docs/auth/RBAC-NAMESPACES.md, docs/runbooks/v4-x1-rbac-migration-and-rollback.md).
  • Tracer co-located at components/tracer; ledger↔tracer seam over gRPC + mTLS (docs/architecture/ledger-tracer-topology.md).
  • Reporter removed — see LerianStudio/reporter#696.

Verification

https://claude.ai/code/session_01RzpM5cJt1wAqEZQ1mHL63n

…ysis

Redis backup_queue hash is the durable WAL of authorized transactions
(atomic seed in the Lua script, no TTL, AOF everysec, post-persist HDel)
so the RabbitMQ DLQ is flow-control only; the financial fix is Epic 4.4
quarantine-then-delete for poison backup records. F21 remediation
reframed accordingly (delete-only would destroy the last copy).

X-Lerian-Ref: 0x1
Nine tasks across five epics per D5-v2: shared retry engine in
pkg/rabbitmq, DLQ as flow-control, Postgres quarantine for poison
backup records, panic conversions, HMAC hard-fail + PARTIAL status.

X-Lerian-Ref: 0x1
…rdening

Epic 4.1: reporter retry machinery generalized into pkg/rabbitmq
(classifier interface, RetryManager engine, republish hook seam,
header-name lock test); reporter re-pointed, behavior tests unchanged.
Epic 4.4 (D5-v2 layer 2): transaction_backup_quarantine table +
repository; poison backup records quarantined after 3 cycles with
insert-before-delete invariant guarded by tests; backup-queue depth/age
+ quarantine + tenant-skip metrics. Epic 4.5 (D7): HMAC hard-fail with
typed 401/0310 permanent error on both pipeline and reconciler paths;
PARTIAL report status with per-section classified error codes. Task
4.3.1: newValidator panic converted to error return.

X-Lerian-Ref: 0x1
transaction.dlx/dlq topology mirroring reporter (TTL 7d, max-len 10k);
the three blanket Nack(requeue=true) sites route through the shared
pkg/rabbitmq retry engine (DefaultClassifier, maxRetries 3, exponential
backoff, ST republish hook with lazy channel resolution); constructor
panic converted to propagated error. DLQ is flow-control only - the
durable copy lives in the Redis backup hash. Also lands files missed
by the previous Phase 4 commit: PARTIAL status constant, metric
definitions, quarantine/metrics test files, go mod tidy artifacts.

X-Lerian-Ref: 0x1
Six parallel territory passes applying T3/T5/T6/T7/T8/T13/E10:
~830 fmt.Sprintf-in-logger sites converted to constant messages with
typed libLog fields; ~400 per-request Info narration lines deleted or
demoted to Debug (Info reduced to the sanctioned milestone list);
~150 leaf child-span ctx rebinds flipped to non-rebinding form (one
live mis-nesting bug fixed in reporter template find_list); ~135
business-class span records switched to HandleSpanBusinessErrorEvent
via pkg.IsBusinessError (CRM 0->24, fees 46, ledger-http 65 boundary
conversions with class-checked helper); ~330 duplicate inner-layer
error logs dropped (single-point logging); all 90+ reflect.TypeOf
entity sites moved to constant.Entity*; import aliases normalized
repo-wide; SQL args dropped from Debug logs. Span-status contract
test added (business failure = UNSET, infra = Error). Absorbs the
pre-existing format-pass noise entangled in swept files.

X-Lerian-Ref: 0x1
domain_operations_total{component,operation,result} +
domain_operation_duration_ms{component,operation} emitted at every
public use-case exit boundary: ledger 45 ops, crm 10, fees 11,
tracer 15, reporter 19 - catalog documented under T11. Shared
RecordDomainOperation helper classifies result via pkg.IsBusinessError;
nil factory is a no-op. Epic 5.6 verified already-satisfied: tracer
has been on MetricsFactory via the OTel-Prometheus bridge since before
this plan; allowlist cardinality model intact - no migration needed.

X-Lerian-Ref: 0x1
make ci = lint + check-telemetry + test-unit (pre-existing test-matrix
target renamed ci-tests); five rg enforcement gates (Sprintf-in-logger,
SetSpanAttributesFromValue, prefixed wire codes, Info narration,
reflect entity names) each proven to fire on a planted violation;
dogsled max-blank-identifiers=3 for the lib's 4-return tracking API;
~135 mechanical lint fixes across components. Docs synced to the
post-normalization reality: standards file:line refs re-resolved,
future-tense process language removed, PROJECT_RULES T5/T8
contradictions fixed, tracer CLAUDE.md rewritten, stale transitional
docstrings corrected. AGENTS.md includes entangled pre-existing edits.

X-Lerian-Ref: 0x1
Update the LerianStudio/github-actions-shared-workflows references from
v1.27.5 and v1.28.9 to v1.33.0 across all GitHub Actions workflows to
leverage the latest CI/CD pipeline features and fixes. Also, bump the
golangci_lint_version from v2.4.0 to v2.12.2 in the Go analysis workflow
to apply updated linting rules and improve code quality checks.
Locked decisions + before/after architecture + engine port→reporter infra
mapping + phase breakdown for embedding fetcher's pkg/engine in-process,
replacing the remote-HTTP fetcher coupling.
…engine

Phase 1 of the reporter→engine migration: the critical-path spike.

- pkg/reporter/engine/: ConnectorRegistry adapter over the embedded
  github.com/LerianStudio/fetcher/pkg/engine (standalone module, empty
  require — zero third-party dep inheritance; lib-commons stays v5.5.0).
- TenantResolver seam with multi/single-tenant implementations. MT resolves
  per-tenant PG/Mongo handles from engine TenantContext via lib-commons
  tenant managers, validates tenant shape (tmcore.IsValidTenantID), and
  rejects empty/malformed tenant in MT mode — no cross-tenant read path.
- sqlQuerier seam (satisfied by *sql.DB and dbresolver.DB) collapses the
  former DirectProvider/FetcherProvider split into one tenant-aware path.
- Streaming bounded-memory cursors (one row at a time, table-by-table)
  for Postgres and Mongo; ctx-cancel honored with category-correct errors.
- Filters fail closed (CategoryValidation) until WHERE translation lands.
- bson.Binary/UUID conversion mirrors the reporter's existing semantics.

Unit + testcontainers integration tests (tenant isolation, projection,
ctx-cancel, 50k-row bounded-memory). Engine require is tidy-stable/direct.
…MT-postgres

Phase 2 of the reporter→engine migration: construct the engine in the
reporter-worker bootstrap with fail-fast validation. The engine is built
but not yet driven by the job handler (that is Phase 3).

Adapters (pkg/reporter/engine/):
- ConnectionStore: read-mostly over the env-configured datasources;
  FindConnection stamps the tenant via WithTenantID (the load-bearing link
  ExecuteExtraction depends on); write methods return unsupported.
- Observability: over the worker's OTel tracer; nil-tracer no-op.
- SchemaCache: optional, Redis-backed, tenant-scoped keys; a Redis fault
  degrades to cache-miss (fresh discovery), never fails extraction.

Bootstrap (components/reporter-worker/internal/bootstrap/):
- config_engine.go assembles engine.New(WithConnectorRegistry/ConnectionStore/
  Observability/Limits); a nil/typed-nil required port aborts boot (fail-fast).
  No CredentialProtector, no encrypted persistence — transport-S3 is dead.
- Real multi-tenant PostgreSQL: tenantPostgresAdapter over lib-commons
  tenant-manager/postgres.Manager (GetDB→dbresolver.DB satisfies SQLQuerier),
  mirroring the tmmongo wiring. Per-tenant pools drain on shutdown.

Tenant isolation invariant: an MT request resolves ONLY its tenant's pool.
GetDB errors fail closed (CategoryUnavailable); a nil connection is guarded
at both the adapter and resolver seams; there is no fallback to the
single-tenant pool or another tenant's DB. Confirmed by production using
MT-postgres reporting — the earlier fail-closed stub is removed entirely.

ExecutionStore deferred (the reporter already persists report status).
Phase 1 seam types SQLQuerier/SingleTenantDatasources exported via type
alias so the bootstrap adapter (a different package) can name them.

Unit + bootstrap tests (tenant forwarding, error propagation without
shared-pool fallback, fail-fast on nil registry, no encrypted persistence).
Build, CI-version lint (v2.12.2), and tests green; lib-commons stays v5.5.0,
no new third-party module entered the graph.
…6.4)

The Makefile pins fell out of sync with .github/workflows/go-combined-analysis.yml
(which they are commented to track): golangci-lint was v2.4.0 vs CI's v2.12.2, and
GO_VERSION was 1.26.3 vs CI's 1.26.4 (go.mod's go directive is already 1.26.4).

Under the stale v2.4.0, `make lint` reported prealloc/wsl_v5 false-positives that
do not exist under the v2.12.2 the CI gate actually runs, so local lint disagreed
with CI. Bumping both pins makes `make lint` and `make`-driven builds reproduce CI.
…e 3)

Replace the remote-HTTP fetcher extraction path with the embedded engine.
Reports now extract synchronously through pkg/reporter/engine instead of
dispatching to fetcher over RabbitMQ + transport-S3.

Engine-driven extraction:
- generate-report-extraction.go drives the engine, decodes planner filters,
  and re-keys engine output (dot-notation schema.table) to the Pongo2
  renderer contract (schema__table) via resolveTableKeys autodiscovery.
- connector_{postgres,mongo}.go gain filter translation with legacy parity
  (Equals single->Eq / multi->IN, Between date upper-bound expansion to
  end-of-day), replacing the fail-closed rejectUnsupportedFilters stub.
- pkg/reporter/engine/filters.go decodes the planner's map[string]any back
  into typed datasource filters with arity validation.

plugin_crm parity (routed OUTSIDE the generic engine, which queries literal
collection names only):
- plugincrm/ module: TransformFilters (document->search.document hash
  pre-transform), FanOutOrgCollections (holders_* org fan-out +
  organization_id injection), DecryptRecords (field decryption).
- uc.extractPluginCRM composes them; tenant context is MT-aware ("default"
  placeholder applies in single-tenant mode only — never substituted under
  multi-tenant, which would resolve a real tenant's DB).
- worker-level integration test proves decryption round-trip, org fan-out +
  org-id injection, hash-filter subset selection, and fail-closed on missing
  key against a live testcontainers MongoDB.

Deletions (HTTP path now dead): notification_consumer, reconciler,
process-notification, extraction-request dispatch, data-pipeline/decrypt/hmac,
config_fetcher, generate-report-data, and their tests. The FETCHER_ENABLED
code gate is removed; the env var + Helm values are cleaned up in a later
phase. redisRequired is now gated on MultiTenantEnabled only.
BREAKING CHANGE: the reporter CRM datasource is now named "crm" everywhere
the legacy "plugin_crm" identifier was used. CRM lives inside the ledger
component; "plugin" was legacy residue from the standalone-plugin era.

Renamed across the reporter subsystem (worker + manager + pkg/reporter + tests):
- Package components/reporter-worker/internal/services/plugincrm -> .../crm.
- Datasource config-name literal "plugin_crm" -> "crm" (crm.DatasourceName,
  crmDataSourceID in pkg/reporter/datasource + reporter-manager, e2e DSCRM).
- Go identifiers: PluginCRM -> CRM fragment (CryptoHashSecretKeyCRM,
  CryptoEncryptSecretKeyCRM, GetDatabaseSchemaForCRM, extractCRM, etc.).
- Env vars: CRYPTO_HASH_SECRET_KEY_PLUGIN_CRM -> CRYPTO_HASH_SECRET_KEY_CRM,
  CRYPTO_ENCRYPT_SECRET_KEY_PLUGIN_CRM -> CRYPTO_ENCRYPT_SECRET_KEY_CRM,
  DATASOURCE_PLUGIN_CRM_* -> DATASOURCE_CRM_* (struct tags, .env.example,
  validation, error-message strings, e2e env map).
- The crm extraction parity integration test tracks the new literals and
  stays green as the regression guard.

"crm" is now a RESERVED datasource token: the handler routes a section to the
crm decrypt + org-fan-out path when the datasource name Is("crm"). A generic
datasource may no longer be named "crm" (the multi-datasource parity test's
generic mongo source was renamed to "mongo" accordingly).

Out of scope (left verbatim): the ledger authz namespace "plugin-crm" (hyphen,
the X1 migration), dated historical docs, and docker-compose container names.

Operator migration (owned outside this commit): existing report templates
referencing {{ plugin_crm.* }} and the datasource registered as plugin_crm
must move to crm; deployed secrets must move to the *_CRM / DATASOURCE_CRM_*
env names.
…e nolint

The five Go component Makefiles each defined their own
GOLANGCI_LINT_VERSION := v2.4.0, which the root var does not export. So
`make lint`/`make ci` linted the components at v2.4.0 while the root tree
(tests/, pkg/) and CI (go-combined-analysis.yml) used v2.12.2 — local CI was
weaker than the real gate. Bump all five pins (plus the two hardcoded
go install lines) to v2.12.2 so make ci faithfully mirrors CI.

Under v2.12.2 the prealloc linter no longer flags the nil-kept overdraft
items slice, so its //nolint:prealloc directive became unused (nolintlint).
Drop the directive; the explanatory comment above the declaration already
documents why the slice must stay nil.
…ant, retire fetcher HTTP path

Phase 4 (Option B) of the reporter→engine migration. The reporter-manager's
schema discovery and validation now run in-process and resolve per-tenant
connection pools through lib-commons tenant managers, mirroring the worker's
Phase 2b wiring. The remote FetcherProvider HTTP path is deleted.

What changed:
- New tenant_schema_source.go resolves the per-tenant pool via
  tmpostgres.Manager (GetDB→dbresolver.DB) and tmmongo.Manager
  (GetDatabaseForTenant→*mongo.Database) behind narrow TenantPostgresManager/
  TenantMongoManager seams, then feeds the existing DirectProvider schema/
  validation/CRM logic from the tenant-scoped snapshot. CRM prefix-grouping,
  org-suffix filtering, postgres schema-ambiguity detection and the D7
  unavailable→warning behavior are preserved unchanged in single-tenant mode.
- DirectProvider gains NewMultiTenantDirectProvider; the four schema-read paths
  dispatch to the tenant source when MT, bypassing the env-pool lazy-connect.
- Bootstrap initManagerSchemaTenantManagers builds both managers off one shared
  Tenant Manager client; factory.go drops the FetcherEnabled gate.
- NewDataSourceRepositoryFromDatabase injects a *mongo.Database without pool
  ownership for the tenant-scoped repository.

Deleted (manager-side fetcher HTTP retirement):
- pkg/reporter/fetcher (whole package), pkg/reporter/auth (M2M + credential
  providers), datasource/fetcher_provider.go, readyz FetcherChecker, the
  service-layer FetcherEnabled flag + isFetcherMode gate, and the dead
  FETCHER_*/M2M_* config block (+ .env.example entries).

Tenant-isolation invariant (third-rail) upheld at every seam: tenant ID read
from context, never substituted under MT; resolution/nil errors fail closed
with no shared- or cross-tenant fallback; the MT validation dispatch sits
before the D7 softening so a resolution failure surfaces as a hard error, not
a masked Valid:true warning.

BREAKING CHANGE: MULTI_TENANT_ENABLED=true no longer requires FETCHER_ENABLED.
The FETCHER_URL, FETCHER_ENABLED and M2M_* environment variables are removed
from the reporter-manager; manager schema discovery is now always in-process.
…mponents/reporter, RUN_MODE)

Phase 5 of the reporter→engine migration. The two reporter deploy units —
reporter-manager (REST API, :4005) and reporter-worker (RabbitMQ consumer +
health server, :4006, PDF/Chromium) — are now ONE Go component at
components/reporter, with the active surface selected at runtime by
RUN_MODE=api|worker|all. Production still deploys SPLIT (two Deployments, one
image); RUN_MODE=all is dev-only.

Structure (history preserved via git mv):
- reporter-manager/internal → components/reporter/internal/manager
- reporter-worker/internal   → components/reporter/internal/worker
- reporter-manager/api       → components/reporter/api
- new internal/app/app.go orchestrator: ParseRunMode (default all, rejects
  typos fast), InitService gating, Service.Run registering each selected
  surface's runnable in ONE libCommons launcher.
- new cmd/app/main.go reading RUN_MODE.

Two deliberate design calls (documented so they aren't mistaken for accidents):

1. The two surfaces keep SEPARATE bootstrap trees composed by a thin
   orchestrator, rather than ledger's single merged bootstrap. Each surface
   owns its own lib-commons tenant managers; collapsing them into one bootstrap
   would risk cross-tenant manager scoping. The orchestrator gates construction
   by RUN_MODE and runs both runnables under one launcher — same single-binary,
   single-launcher, split-deploy outcome with the tenant-isolation invariant
   preserved by construction. An unselected surface stays nil and opens no
   connections.

2. The old components/reporter-manager and components/reporter-worker dirs
   survive as Dockerfile-ONLY image-name anchors. The shared CI build workflow
   derives the published image name from the build-context directory basename,
   so keeping these dirs (each with just a Dockerfile that builds the unified
   binary) keeps the midaz-reporter-manager / midaz-reporter-worker image names
   and the .manager / .worker Helm value keys stable — leaving the Helm chart
   untouched for devops. A header comment in each Dockerfile explains this.

Worker graceful shutdown is preserved verbatim: the full ordered teardown
(reconciler cancel, health checker, health server, PDF pool, event listener,
multi-tenant resources, RabbitMQ, MongoDB, telemetry flush) moved into
worker bootstrap Service.Shutdown(), invoked by both the standalone Run() and
the orchestrator, so SIGTERM drains identically either way.

Build: root Makefile gains a single `reporter` component (one build target →
.bin/reporter); the two Dockerfiles build the same binary differing only by
base image + default RUN_MODE; CI build.yml adds components/reporter to
shared_paths (a source change rebuilds both images) while filter_paths, image
names, and Helm key mappings are unchanged; go-combined-analysis and
pr-security-scan filter_paths repointed.

BREAKING CHANGE: the reporter is now a single binary selected by RUN_MODE.
A deployment that previously ran the reporter-worker binary must set
RUN_MODE=worker, and the reporter-manager deployment must set RUN_MODE=api
(baked as the default in each respective image). Helm charts are unchanged;
devops applies the per-Deployment RUN_MODE.
…ync docs to unified binary

Phase 6/7 cleanup of the reporter→engine migration. With the remote fetcher
HTTP path gone, this removes the worker-side remnants and reconciles docs to
the unified components/reporter (RUN_MODE=api|worker|all) reality.

Code (components/reporter/internal/worker/bootstrap):
- Remove 10 dead fetcher/M2M config fields (FetcherEnabled, FetcherURL,
  AppEncKey, FetcherStorageBucket, FetcherStorageEndpoint, M2M client/secret,
  M2MTargetService, M2M cache TTLs) — each verified to have zero read sites.
- Narrow the SaaS-TLS Redis dependency gate from
  `FetcherEnabled || MultiTenantEnabled` to `MultiTenantEnabled` only. The
  reconciler (the sole single-tenant Redis consumer) was deleted at cutover, so
  Redis is now required only under multi-tenancy — this matches the already
  MT-only gate in BuildWorkerCheckers.
- Drop the dead reconcilerCancel stub (field + shutdown branch) and the
  obsolete fetcher-gated TLS tests, keeping the live MT-Redis cases.

readyz: scrub the stale FETCHER_ENABLED operator-facing reason string and
self-probe/aggregation fixtures (the worker dep set is five, no fetcher); add a
NotContains regression guard.

Docs: STRUCTURE.md, AGENTS.md, CLAUDE.md, docs/PROJECT_RULES.md, and the
worker .env.example now describe one reporter binary deployed split via
RUN_MODE rather than two services; the migration plan is marked complete.

Load-bearing items left untouched: the ModuleManager/ModuleWorker tenant-scope
constants, datasource/factory.go, the WorkerContainer integration-test default,
and the Dockerfile image-name anchor stubs.
…umer test narrative

Final cleanup of the reporter→engine migration. With the remote fetcher HTTP
path, its worker consumer/reconciler, and the manager FetcherProvider already
gone, this removes the last unreferenced remnants of the skeleton. Each target
was grep-proven to have zero live callers repo-wide (excluding the file being
deleted and its own tests) before removal.

Code deleted (pkg/reporter):
- mongodb/extraction/ — whole package (7 files): the jobID→reportID mapping
  subsystem for the deleted async fetcher path. No non-self callers.
- crypto/ — whole package (key_deriver.go): fetcher TRANSPORT key derivation
  (HMAC verify + S3 blob decrypt). Distinct from CRM crypto, which uses
  lib-commons libCrypto.Crypto via CryptoEncryptSecretKeyCRM — untouched.
- storage/fetcher_adapter.go (+ test): the transport download adapter. Only
  these two files; the rest of storage (s3-client, seaweedfs, ports, config)
  is live and kept.
- datasource/types.go: remove the ExtractionJobRequest and ExtractionMapping
  payload structs (+ companion test), the last consumers of which were the
  deleted extraction package. Ripple: dropped the now-orphaned `import "time"`.
- constant/mongo.go: remove MongoCollectionExtractionMapping ("extraction_mapping").

Test narrative (components/reporter/internal/worker/bootstrap/retry_guard_test.go):
- Exorcise the dead Consumer-2 (fetcher notification) narrative left behind by a
  prior phase: rename TestNotificationHandler_* → TestReportHandler_*, repoint
  the comment from the deleted ProcessFetcherNotification to the live
  handlerGenerateReport, and relabel the scenarios/fixtures that named the dead
  consumer (extraction_mapping → report, "parse notification" → "parse report
  request", "stale extraction" → report generation). Pure rename/relabel —
  every error value and assertion is preserved, no coverage change.

Load-bearing item left untouched: ErrExtractionJobFailed (0287) — the engine's
in-process extraction-failure sentinel, live in five sites (retry_guard,
generate-report-data, pkg/errors, rabbitmq classifier, datasource alias). It is
NOT a fetcher vestige; the in-process engine still extracts datasource rows.

Verified green: go build ./..., go vet (reporter + pkg/reporter), the full
reporter unit suites, and golangci-lint v2.12.2 (0 issues). Repo-wide grep
confirms zero surviving references to any deleted symbol.

The remote fetcher is fully retired; the reporter→engine migration is complete.
…eck-docs guardrail (Phase 1)

Phase 1 of the OpenAPI documentation quality plan
(docs/plans/2026-06-10-openapi-doc-quality.md), resolving the audit's
pipeline + parity findings (docs/openapi/AUDIT-2026-06-10.md). All edits are
swag annotations, generator tooling, and the regenerated specs they produce.

Pipeline (H5, H6):
- generate-docs.sh + sync-postman.sh: COMPONENTS "reporter-manager" -> "reporter".
  The dead reporter-manager (Dockerfile-only CI anchor, no cmd/app/main.go) made
  `make generate-docs` fail at the reporter step on a clean tree; it now resolves
  the real components/reporter binary.
- convert-openapi.js: the COMPONENT_PORTS key was still "reporter-manager", so
  after the rename the reporter port fell through to the 3002 ledger default.
  Renamed the key to "reporter" -> reporterPort now correctly resolves to 4005.
  Fixed three stale reporter-manager comments alongside.
- Retired the stale, drifted postman/specs/reporter-manager/ (git mv -> reporter/);
  the old hub copy had already lost the Partial status enum.

General-info parity (M4, M5, M7, L11, L14, L15, L16) across the three
cmd/app/main.go headers, now byte-identical on the shared info fields:
- @Version -> 4.0.0 (dropped ledger's v-prefix).
- @title -> "Midaz {Ledger,Tracer,Reporter} API" (added the Midaz prefix).
- @termsofservice -> the Elastic License URL (was the swagger.io scaffold).
- @schemes -> "http https" (ledger gained https; reporter gained the line).
- @contact + @license -> added to tracer and reporter (were contact:{}, license:null).
- reporter Bearer description aligned to ledger's canonical wording; reporter
  @description now states REST serves only in api/all mode (worker is health-only).
- tracer @description enriched to name its bounded contexts.
- Deliberately untouched: tracer's ApiKeyAuth/X-API-Key scheme is correct
  (lib-auth v2 API Key), not a parity defect.

Guardrail (L17):
- postman/generator/check-docs.sh: parity half (always) asserts the shared
  info fields are identical across the three swagger.json via jq; drift half
  (CHECK_DOCS_REGEN=1) regenerates and asserts git-clean against committed specs.
- `make check-docs` target at the repo root (next to generate-docs, per the
  repo's docs-target convention); wired as a "Check Docs" job in pr-validation.yml.
- postman/README.md documents the parity fields.

Includes the regenerated specs for all three components, the refreshed Postman
collection/environment, and the governing audit + plan docs.

Verified: `make generate-docs` exit 0; `make check-docs` parity green; jq
confirms the six parity fields identical; reporterPort=4005; regeneration is
idempotent (second run byte-identical, so the drift gate will pass); all three
cmd/app binaries build.
Phase 2 of the OpenAPI doc-quality initiative. Brings annotation hygiene
to quality parity across ledger, tracer and reporter.

@name sweep (M-series): 22 swag @name directives relocated onto each
struct's closing brace (`} // @name X`) — swag v1.16.6 ignores @name when
placed as a leading comment above `type X struct`. Resolves package-dotted
definition keys for reporter (23 -> 5 dotted) and tracer api types
(40 -> 36 dotted); the remainder are intentionally deferred to Phase 5
(feeshared billing, tracer pkg/model, reporter unexported types, HTTPError).

Tag taxonomy + groups (M8): @tags normalized to Title-Case plural across
all three components; @router HTTP methods lowercased for consistency.
@tag.name/@tag.description group blocks added to each general-info header
AND relocated BEFORE @securityDefinitions — swag drops @tag.* directives
that follow the security-scheme @description. Emitted .tags now
populated: ledger 21, tracer 7, reporter 6.

Text fixes: commit/cancel 400 descriptions corrected (were "cannot be
reverted"); report status example capitalized to match the persisted
constant; "plugin" wording replaced with "reporter"; single-quote array
examples converted to swag comma form; stale retired TRC- prefixes in two
tracer files replaced with canonical sentinel constant names.

Param examples: 15 query-parameter `example(...)` tokens removed. The
Swagger 2.0 Parameter Object does not support `example`; emitting it broke
the openapi-generator conversion. These were never present in the
generated spec before (swag silently ignored the prior malformed tokens),
so this is zero-regression. Param-level examples belong to a future
OpenAPI 3.0 migration.

Specs regenerated; parity guardrail green.
Resolve audit finding C1: the ledger declared a BearerAuth securityDefinition
that zero operations referenced (dangling auth). Apply per-operation
`@Security BearerAuth` to all 111 ledger operations and drop the 111 ad-hoc
optional `@Param Authorization` header lines.

Mechanism: model (a) was framed as a global security requirement, but swag
v1.16.6 does not emit a top-level `.security` from a general-info `@security`
directive — it only honors per-operation `@Security`, emitting
`[{"BearerAuth":[]}]` per op. This is the pattern tracer (28/31) and reporter
(22/22) already use, so per-op delivers identical secure-by-default behavior
plus true cross-component source-style parity. C1 was ledger-only.

- 25 handler files: in-place swap `@Param Authorization` -> `@Security
  BearerAuth`; `@Param X-Request-Id` tracing header preserved; path/body/query
  params untouched.
- Regenerated ledger specs: 111/111 operations carry
  `.security == [{"BearerAuth":[]}]`, 0 Authorization params remain,
  securityDefinitions byte-identical, definitions/summaries unchanged.
- check-docs.sh: new always-on security-coverage guard (ledger-only) that fails
  listing any ledger operation lacking `.security`. tracer's public
  /health,/readyz,/version and reporter (already fully secured) are out of scope.

Verified: 111/111 secured, ledger builds, parity green, spec diff provably
security-only (definitions + non-Authorization params identical to HEAD).
…B resolver

T1: buildTracerReserver fails fast when MULTI_TENANT_ENABLED && TRACER_BASE_URL is set but no M2M auth provider is wired (none exists yet) — refuses to ship unauthenticated, tenant-less reserve calls on the transaction hot path. F1: inject the fees Mongo manager into TransactionHandler so the fee seam resolves the tenant fee DB (test in transaction_fee_tenant_test.go).
…ient seams

T4: InjectHTTPContext in do() so all five reservation ops continue the ledger trace instead of starting orphaned roots. T5/T6: remove the orphaned circuit-breaker seam (test-only, never wired). T7: remove dead WithHTTPClient option.
…econ-epic41)

Slice Epic 4.1: onda 4.1a (rewire generate-docs/check-docs onto the Huma OAS
3.1 dumps + de-risk the redocly join/lint, swaggo fully intact/additive) then
onda 4.1b (retire swaggo annotations + runtime wiring + generated files,
preserving tracer/api/types.go; go mod tidy; delete pkg.HTTPError + fix the 5
compile-breakers). Anchors from recon-epic41 baked in; version test->4.0.0.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Promote the native Huma OAS 3.1 dump version from the placeholder "test"
to the contract version "4.0.0" on both server planes, then regenerate
the two committed goldens.

- tracer: buildTracerHumaAPI openapi.Config.Version test -> 4.0.0
- ledger: buildUnifiedHumaAPI openapi.Config.Version test -> 4.0.0
- regenerated components/{tracer,ledger}/api/openapi.huma.yaml goldens

The value stays hardcoded (never os.Getenv) so the golden dump remains
hermetic and drift-deterministic. This lets the docs pipeline switch its
source to the Huma dump: check-docs.sh requires .info.version to match
^4.0.0$, currently satisfied by the swaggo main.go @Version that will be
retired in wave 4.1b. Additive: swaggo annotations and generated artifacts
are untouched.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Repoint postman/generator/generate-docs.sh spec-gen at the native Huma
OAS 3.1 dumps instead of swag+openapi-generator:

- generate_openapi_spec now runs the golden TestOpenAPISpecDump with
  -update per plane (regenerates components/<c>/api/openapi.huma.yaml);
  resolve_swag_bin, generate_openapi_yaml (Docker), and SWAG_BIN removed.
- publish_specs copies openapi.huma.yaml (was the swagger triplet).
- consolidate_openapi joins the two openapi.huma.yaml inputs (ledger
  first, --prefix-tags-with-info-prop title, same output). Version-parity,
  security-scheme, and orphan-ref guards preserved; both dumps are
  3.1.0 and the tracer dump declares BearerAuth + ApiKeyAuth. Stale
  "ApiKeyAuth (tracer)" comment corrected: the tracer declares both.
- Drop the now-unmaintained tracked swagger triplet under
  postman/specs/<c>/, publish the Huma dumps in their place, and refresh
  the consolidated specs + Postman collection.

Swaggo (annotations, generated api/*, /swagger routes, go.mod) untouched;
retirement is a later wave.

Verified: make generate-docs runs with no swag/Docker, emits
postman/specs/midaz.openapi.yaml (openapi: 3.1.0), deterministic across
three runs.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Point the docs guardrail at components/<c>/api/openapi.huma.yaml instead of
the swaggo swagger.json. jq cannot read YAML, so read_field/read_field_raw
and security_coverage_check now project the dump to JSON via the same bundled
js-yaml the generator uses.

Parity check drops swaggo-era fields that OAS 3.1 / the Huma dump no longer
carry: .schemes (absent in 3.1) and .info.contact/.license/.termsOfService
(Huma emits only title + version). The ^Midaz title assertion is dropped too:
title is per-plane, not shared metadata, and the ledger dump still carries the
contract-spec golden-test placeholder title. Parity now asserts .info.version
is byte-identical across planes and matches ^4.0.0$.

Security coverage (ledger 113/113) and the redocly consolidated lint are
unchanged in intent — only the source file and its YAML->JSON read path move.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The Huma rewire (2bd6f3e) switched publish_specs to copy each plane's
openapi.huma.yaml into postman/specs/<c>/, but sync-postman.sh still fed
convert-openapi.js the dead swagger.json path. Every component hit the
'spec not found' branch, ledger came back SKIPPED, and convert_to_postman
failed the whole generate-docs run. One line: point the converter at the
published openapi.huma.yaml (convert-openapi.js already reads YAML natively).

Regenerated MIDAZ.postman_collection.json is the resulting artifact (28
folders). The joined midaz.openapi.{yaml,json} were committed in the prior
wave and reproduce byte-identically (drift check green), so no delta there.

Verification (both makes green):
- make generate-docs: full pipeline to Postman collection, no failures.
- CHECK_DOCS_REGEN=1 make check-docs: parity (info.version 4.0.0 identical
  across ledger/tracer), security coverage (113/113 ledger ops secured),
  redocly lint on joined spec EXECUTED (81ms, valid, 29 inherited warnings,
  not skipped) and PASSED, drift check reproduces committed artifacts.
- Joined spec: openapi 3.1.0, 141 ops (ledger 113 + tracer 28),
  components.schemas.Error present (RFC 9457), no raw Detail schema.
- Swaggo intact: authoritative components/*/api/{docs.go,swagger.json,
  swagger.yaml} + 49 annotated source files untouched by the wave.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The postman collection generation used wall-clock timestamps and random
UUIDs, so every 'make generate-docs' run produced a different collection —
the committed artifact never sat clean and any drift check over the full
generator output would flag spurious drift on every run.

- convert-openapi.js: replace new Date().toISOString() at the three
  date-time example sites with a fixed EXAMPLE_DATE_TIME constant.
- lib/workflow-processor.js: replace random uuidv4() for Postman element
  ids (event script ids and the workflow folder _postman_id) with a
  content-seeded uuidv5(), stable across runs.

The regenerated collection is now byte-identical across consecutive runs
(sha eff7b002... == eff7b002..., 0 diff lines). check-docs.sh passes with
'Regeneration reproduces committed docs artifacts (no drift)'. Swaggo and
the drift-gated spec dumps (components/*/api, postman/specs) are untouched.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
…y guard

The ledger golden fixture buildUnifiedHumaAPI (contract_spec_routes_test.go)
set info.title "contract-spec" — a divergence from the production humaMount
closure (unified-server.go:127), which serves "Midaz Ledger API". Wave 4.1a
wired the docs pipeline onto that golden dump, so the fixture placeholder
leaked into the published consumer-facing spec: postman/specs/midaz.openapi.
{yaml,json} carried title "contract-spec" and — because the ledger-first
redocly join uses --prefix-tags-with-info-prop title — all 22 ledger tags
were prefixed "contract-spec_" instead of the swaggo-baseline "Midaz_Ledger_
API_".

- contract_spec_routes_test.go:116: Title "contract-spec" -> "Midaz Ledger
  API", so the fixture mirrors production and the regenerated dump/join carry
  the runtime title + baseline-identical tag prefixes.
- check-docs.sh parity_check: re-enable the ^Midaz title assertion (now that
  no fixture placeholder can leak), asserted per-plane so each keeps its own
  "Midaz ..." name. contact/license/termsOfService/schemes stay honestly
  dropped (Huma emits only info.{title,version}; OAS 3.1 has no .schemes).
- Regenerated ledger golden + joined specs; zero residual "contract-spec".

Determinism (uuidv5 + frozen example date) preserved; swaggo untouched.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
…ecorded

Onda 4.1a (pipeline → Huma 3.1 native, additive) marked Done. Records the
supervisor gate over the HEALED_NEEDS_REVERIFY return: L1 determinism healed
at root (uuidv5 + frozen example date), reverified clean; the orphaned Medium
title-leak closed by aligning the ledger golden fixture title to the runtime
("Midaz Ledger API") and re-enabling the ^Midaz parity guard. Onda 4.1b
(swaggo retirement + Epic 3.3) is now the current wave.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
… general-info

Onda 4.1b swaggo retirement. Strips all swaggo annotation comments
(@Summary/@Router/@Tags/@Param/@Success/@Failure/@Accept/@Produce/
@Security/@ID/@description) from the 27 ledger HTTP handlers under
components/ledger/internal/adapters/http/in and the general-info block
(@title/@version/@host/@BasePath/@securityDefinitions/...) from
components/ledger/cmd/app/main.go.

Genuine Go doc-comments are preserved; only annotation lines and their
now-orphan `//` separators (those sitting immediately before the func)
are removed. The Huma-native OAS 3.1 dumps
(components/ledger/api/openapi.huma.yaml) are the sole spec source now;
swaggerEnabled() -> openapi.ServeSpec is untouched. No runtime behavior
change.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
… general-info

Retire swaggo from tracer now that the docs pipeline consumes the native
Huma dump (components/tracer/api/openapi.huma.yaml). Delete the @Summary/
@Router/@Param/@Success/@Failure/@Security/@ID/@Tags/@Accept/@Produce/
@description godoc blocks from the 8 http/in handlers, the two struct-level
//@name comments in transaction_validation_handler.go, and the general-info
block (title/version/tags/contact/license/securityDefinitions) in
cmd/app/main.go. Real Go doc-comments are preserved.

No runtime behavior change. readyz.go keeps its tracer/api import and all
executable code (api.ReadyzResponse/ReadyzCheck) — only its swaggo
annotations are removed. swaggerEnabled()+openapi.ServeSpec (ledger) and the
Huma OAS dumps are untouched.

Verified: build ./components/tracer/... EXIT0; go vet http/in clean;
handlers+main have zero swaggo annotations; @Router/@Security grep over
components/tracer is empty.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
…rgets

Retire swaggo across both planes now that the docs pipeline consumes the
native Huma OAS 3.1 dumps (components/{ledger,tracer}/api/openapi.huma.yaml).

Ledger:
- unified-server.go: drop the blank api import, the fiber-swagger import, and
  the legacy /swagger + /swagger/* mount; keep swaggerEnabled() gating the Huma
  openapi.ServeSpec surface.
- delete bootstrap/swagger.go (WithSwaggerEnvConfig/initSwaggerFromEnv) and the
  generated api/{docs.go,swagger.json,swagger.yaml,openapi.yaml}.

Tracer:
- routes.go: drop the fiber-swagger import and the /swagger/* mount; keep
  SwaggerEnabled gating openapi.ServeSpec.
- delete adapters/http/in/swagger.go, scripts/verify-api-docs.sh, and the
  generated api/{docs.go,swagger.json,swagger.yaml,openapi.yaml}. api/types.go
  survives (LIVE in readyz.go).

Build tooling: go mod tidy drops swaggo/{fiber-swagger,swag,files} directs (and
the now-unused go-openapi/swag/* indirects); remove swag install steps and the
tracer verify-api-docs target; refresh root/component Makefile doc-comments.

Also refresh now-stale swaggo references left in handler and test doc-comments
(the annotations they describe were already removed), retire the DC-3
swagger.json route-diff gate and the /swagger UI-asset unit test (both asserted
the deleted swaggo surface), and drop the /swagger cases from the tracer auth
integration test — preserving the shared buildUnifiedHumaAPI seam the Huma
golden dump depends on.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
pkg.HTTPError had zero runtime constructors; its 10 swaggo @failure
annotations were already removed in wave 4.1.5-ledger. Delete the struct
and its Error() method, drop the HTTPError-only TestHTTPError_Error, and
remove the four dead `err.(*pkg.HTTPError)` type-assert branches in fee
tests (ValidateParameters returns *pkg.ValidationError, never HTTPError).
Prune the now-orphaned pkg import in httputils_test.go.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Remove residual swaggo/go-swagger doc-comment annotations left behind by the
swaggo retirement wave. The contrarian pass flagged 49 inert `@Description`
comment-directives that violated the wave's empty-annotation assertion; a
full sweep also found orphaned `swagger:model`/`swagger:response`/`@name`
directives and go-swagger `in: body` markers across 27 files.

These comments are dead: the repo has no swag parser (no swaggo import, no
docs.go/swagger.json, no comment-parsing tooling), so nothing consumes them.
The Huma OAS 3.1 pipeline reads struct-field tags (swaggertype/enums/example/
format), which are left untouched — `CHECK_DOCS_REGEN=1 make check-docs`
reproduces the committed dumps with zero drift.

Preserved: swaggerEnabled()/openapi.ServeSpec wiring, tracer/api/types.go
hand-written types (ErrorResponse/VersionResponse/ReadyzResponse/ReadyzCheck)
and their /readyz usage, both openapi.huma.yaml dumps, all struct-field tags.
No runtime behavior change.

Verification: go build ./... EXIT 0, go vet clean, gofmt clean, existing
suites green, check-docs no-drift, absence greps empty.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Wave 4.1b's swaggo retirement orphaned three items its handler-scoped tasks
did not cover; this closes them so the retirement is clean.

- components/tracer/api/types.go: delete ErrorResponse and VersionResponse.
  Both were documented "Used in Swagger documentation" and referenced ONLY by
  the swaggo @Failure/@success annotations + generated docs.go, all deleted
  this wave. Zero live consumers (the Version handler emits ad-hoc JSON; the
  error path uses lib-commons RFC 9457 problem). Deleting them honors the
  file's own ReadyzCheck manifesto against phantom-documentation types.
  ReadyzResponse/ReadyzCheck stay (live in readyz.go — invariant B).
- components/ledger/pkg/feeshared/nethttp/httputils_test.go: the errCode table
  field was asserted only via the deleted `err.(*pkg.HTTPError)` branch (dead:
  pkg.HTTPError was never constructed), so 13 populated codes went unchecked.
  Re-point the assertion at the LIVE error: ValidateParameters surfaces failures
  via ValidateBusinessError -> pkg.ValidationError{Code: constant.Error()}, so
  errors.As + assert on .Code restores the per-code coverage the table intended.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Records the 4.1b supervisor gate over HEALED_NEEDS_REVERIFY: L4 contrarian
defect (both swaggo + go-swagger annotation dialects) swept by self-heal; 3
orphan Lows closed by the supervisor (dead ErrorResponse/VersionResponse
deleted, errCode test repointed to the live pkg.ValidationError). Invariants
A/B/C verified. Epic 4.2 (parity lock + redocly re-enable + DC-3 route-diff
gate reinstatement) is now the current wave.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Redocly rule re-enablement (empirical), joined-spec Error lock complementing
the Go closure test, DC-3 route-diff gate reinstated against openapi.huma.yaml,
make ci verify.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The redocly lint config relaxed 8 rules that predated the swaggo -> Huma
migration. With the native Huma OAS 3.1 dumps now the sole source, re-verify
each rule against the current joined spec:

Re-enabled (0 findings on the Huma dumps):
- no-server-trailing-slash
- no-server-example.com
- security-defined
- no-unused-components
- no-invalid-schema-examples

Kept off (still trip on the Huma output; comments rewritten to the real
Huma-era cause, no longer 'swag emits ...'):
- no-empty-servers      join artifact: root servers emptied by design
- operation-4xx-response 71 ledger+tracer ops without an explicit 4xx entry
- no-ambiguous-paths     2 structural ledger balances/operations sub-paths

CHECK_DOCS_REGEN=1 make check-docs passes green (parity, version 4.0.0,
security-coverage 113, redocly lint, no drift).

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The Go test tests/openapi (error_schema_parity_test.go) is the primary
lock on the Error closure, but it reads the per-plane Huma dumps, not the
joined artifact. The joined spec (postman/specs/midaz.openapi.json,
consumed by the Plan B SDK) is the output of redocly join; if the join
ever collides two non-identical Error schemas, redocly de-dups by
suffixing (Error, Error2). That would slip past the Go test.

Add error_schema_singleton_check to check-docs.sh: assert the joined
json has components.schemas.Error, no dedup-suffixed siblings
(^Error[-_]?[0-9]+$, so ErrorDetail is unaffected), and the RFC 9457
problem fields the SDK relies on. Skip-with-warning when the artifact is
absent, mirroring consolidated_lint_check.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Restore the DC-3 contract gate removed in eca60ed, now reading the
generated Huma OAS 3.1 dump (api/openapi.huma.yaml) instead of the deleted
swaggo swagger.json. TestContractSpecMatchesRoutes asserts the Fiber-mounted
route surface equals the dump's published (method, path) set in both
directions.

Adaptations from the base swaggo version:
- collectSpecRoutes parses YAML via gopkg.in/yaml.v3 and prefixes /v1 to
  each server-relative path, since the Huma spec carries the base path in
  its servers block, not in the path keys.
- canonicalizePath normalizes both Fiber ':param' and OpenAPI '{param}' to a
  positional token so the surfaces compare on structure, not label.
- Locked exempts (const, one comment each): GET /health, /version, /readyz
  public probes, and the intentional Fiber-only multipart
  POST .../transactions/dsl route (sunset 2026-08-01). No /swagger* — retired.

mounted=113, spec=113, zero divergence.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
…iber

Task 4.2.4 (verify make ci) surfaced a latent `unparam` lint failure that
predates Epic 4.2: createTransactionFiber declared an `isRevert bool` param
that every one of its five callers passes as false. The revert path reaches
the money-write orchestrator independently — createRevertTransaction calls
executeCreateTransaction(..., true, ...) directly — so the helper's param was
speculative dead surface (YAGNI) from the Wave-4 transaction migration.

Remove the param; hardcode false at the executeCreateTransaction call site.
Behavior-preserving: all callers already passed false, so the value reaching
executeCreateTransaction is unchanged, and the isRevert=true revert semantics
(applyFees skip, reverse-transaction branch) remain wired through
createRevertTransaction untouched.

Surfaced-by: make ci lint stage (golangci unparam), which the earlier
targeted-test gates (4.1a/4.1b) did not exercise.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Second latent lint failure surfaced by task 4.2.4 (verify make ci), masked
behind the createTransactionFiber unparam failure until that was fixed:
make lint iterates scopes fail-fast (components -> tests -> pkg), so the pkg
scope was never reached while ledger still failed.

classifyForProblem's ten consecutive `if errors.As(...)` type-dispatch blocks
had no blank line between them; wsl_v5 flags each `if` that follows a closing
statement block. Add a blank line between each arm. Pure formatting — the
error classification order and RFC 9457 status mapping are byte-for-byte
unchanged.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Task 4.2.4 (verify make ci) exposed that tests/openapi — the offline
cross-plane locks over the committed Huma OAS 3.1 dumps, including the
byte-identical RFC 9457 Error closure the SDK (Plano B) consumes — never
ran in the gate. test-unit discovers packages with `go list ./...` then
drops everything under ./tests (that path is otherwise integration-only,
needing Docker), and nothing re-added the offline openapi package. So the
Go closure lock that 4.2.2's joined-spec singleton check was written to
COMPLEMENT was itself unenforced.

Add a test-openapi-locks target (offline: yaml only, no server/DB/Docker)
and invoke it from ci after check-docs, so the locks read the freshly
regenerated-and-drift-verified dumps. The joined-spec singleton check
(check-docs) covers the published artifact; this covers per-plane closure
byte-identity. Both now gate.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The LLM-judge review of Plano A found a latent fourth false-green: test-unit
discovers packages via `go list ./...`, which exits non-zero with empty stdout
in a git worktree (VCS-stamp error). The old empty-pkgs branch then printed
"No unit test packages found" and exited 0 — silently skipping all ~13.9k unit
tests while make ci reported success. Same fail-open class as the tee-pipe and
poisoned-lint-cache traps found earlier this session.

Two-part fix:
- Self-sufficient: export GOFLAGS="-buildvcs=false ${GOFLAGS}" in the recipe so
  `go list` and `go test` work in a worktree without an external flag (prepended
  to preserve any caller GOFLAGS). Also add -buildvcs=false to test-openapi-locks.
- Fail-CLOSED: empty discovery is now an error (exit 1) with a diagnostic, not a
  vacuous pass. The repo root always has unit packages; empty means discovery
  broke, which must fail loudly rather than green-wash.

Production CI (normal clone) was never affected; this hardens local runs and
any future discovery breakage.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The swaggo retirement (Epic 4.1) swept handlers + postgres adapters + some
model packages, but missed pkg/mmodel and pkg/mtransaction — the shared model
types the SDK consumes. The LLM-judge review of Plano A flagged ~700 inert
annotation lines still contradicting the "fully retired" criterion at source
level.

Remove them all: `// swagger:model|response`, go-swagger `} // @name X`
trailing markers (reduced to bare `}`), `// @description|@example|@type|@format`
blocks, and orphaned `// in: body` field markers. Comment-only across 25 files
(132 insertions / 881 deletions) — verified no struct field, tag, or type
changed: every removed line is a comment/`} // ...`, every added line a bare
`}`. Provably inert: CHECK_DOCS_REGEN=1 make check-docs regenerates both Huma
OAS dumps byte-identically (no drift) — nothing parsed these annotations.

Also rewrite postman/README.md from the retired `swag init` + Docker
openapi-generator flow to the live TestOpenAPISpecDump + redocly-join pipeline.

pkg/ now carries zero swaggo/go-swagger annotation residue.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
Flip 4.2.1-4.2.4 to Done with landed commits + mutation-proof outcomes;
Epic 4.2 + Phase 4 → Complete. Record the gate deviations (2 latent lint
defects the make-ci verify surfaced, 3 false-greens neutralized, the parity
lock wired in, the annotation sweep) and the independent LLM-judge verdict
(CONDITIONAL PASS → PASS after the test-unit fail-closed fix; zero blockers;
accepted documented residuals). Plano A closed; handoff to Plano B stands.

Claude-Session: https://claude.ai/code/session_01P4Zy5DofM3BxGLwivcRJi4
The shared-engine KMS merge (c3e2be4) changed three contracts that only
the //go:build integration tag exercises, so `make ci` stayed green while
the integration suite was red (8 build errors + 24 failures):

- crm Mongo repos now take encryption.FieldEncryptor (which needs
  DecryptField); 4 tests still passed the raw *lib-commons/crypto.Crypto.
  Wrap it via NewEncryptionService -> NewFieldEncryptorAdapter, mirroring
  the passing holder integration test.
- OrganizationKeyset.Validate now requires Version >= 1; createValidKeyset
  omitted it. Set Version: 1 to match production provisioning.
- The registry revision-conflict test asserted LegacyReadable stayed false,
  but NewOrganizationRegistryRecord defaults it true, so the assertion never
  held. Flip the mutation so the test actually proves a revision-rejected
  update does not persist the change (production Update was already correct).
- The keyset unique index is compound (tenant_id, organization_id, version);
  the dup-insert test never set TenantID, so it dodged the index. Stamp the
  saved tenant onto the direct insert.

Test-only; no production code changed. Each affected package re-run green.

Claude-Session: https://claude.ai/code/session_01SH3ADYS6hGU91Dy67B96oG
637730d added a node `require("js-yaml")` to postman/generator/check-docs.sh
(Huma OAS dump -> JSON conversion) but never wired its install into CI. Since
postman/generator/node_modules is gitignored, the check-docs job's `node -e`
fails with "Cannot find module 'js-yaml'" — the job passed before only on
runners that happened to have the module ambient. Add `npm ci --prefix
postman/generator` before the verify step so the gate is hermetic.

Claude-Session: https://claude.ai/code/session_01SH3ADYS6hGU91Dy67B96oG
The PR-validation golangci-lint pin (v2.4.0) trailed the local Makefile pin
(v2.12.2), so the merge gate enforced an older ruleset than developers ran.
Align the CI gate to v2.12.2; the tree already passes it locally.

Claude-Session: https://claude.ai/code/session_01SH3ADYS6hGU91Dy67B96oG
Reconcile the doc surface with two large merges that landed on this branch:
the completed swaggo->Huma / RFC 9457 error-envelope migration, and the CRM
field-encryption + Vault-Transit KMS subsystem.

- Canonical agent docs (CLAUDE.md, AGENTS.md, PROJECT_RULES.md): fix the stale
  streaming API block (ToEvent->ToEmitRequest, Builder-owned source), dependency
  versions (lib-commons v5.8.0, lib-observability v1.1.0, lib-streaming v1.6.2),
  the Huma+problem+json HTTP layer, CRM error count (16->28, CRM-0006..CRM-0041),
  the CI workflow table, and swaggo->Huma; add a CRM Field Encryption / KMS section.
- Standards: rewrite error-handling E13 to the RFC 9457 problem+json wire contract
  (code/status tuple preserved); add the crm_protection_* metrics to telemetry D6;
  fix line-rot and dead plan links (replaced with git-history notes).
- Runbook: add the KMS envelope-encryption rollback one-way door + Vault env
  surface (the data-safety claim was mode-dependent).
- New docs/architecture/crm-field-encryption.md documenting the subsystem.
- New components/ledger and components/infra READMEs (tracer parity).
- Godoc on the encryption/crypto packages (comment-only, no behavior change).
- Rewrite llms.txt/llms-full.txt (root + tracer) to match.
- Archive LEDGER.md -> docs/branch-review-campaign.md.

Known gap (documented, not fixed): pkg/net/http/withRecover.go panic path still
emits the legacy {code,title,message} envelope, diverging from the RFC 9457
WithError path.

Claude-Session: https://claude.ai/code/session_01SH3ADYS6hGU91Dy67B96oG
…lope

The WithRecover middleware hand-built a legacy fiber.Map{code,title,message}
body on panic, diverging from every other error path which serializes as RFC
9457 application/problem+json via WithError. A client parsing problem+json
mis-parsed a panic response. Route the recovered panic through WithError as an
internal-server error (pkg.ValidateInternalError) so both paths emit the
identical envelope — one producer, one shape. Status stays 500 and the panic
message / stack frames remain scrubbed (verified green by the CRMCollapse
panic integration test). Updates error-handling.md E9/E13 to match.

Claude-Session: https://claude.ai/code/session_01SH3ADYS6hGU91Dy67B96oG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: build Makefile, Dockerfile and local-stack definitions area: ci/cd GitHub Actions workflows and release configuration area: crm CRM component area: dependencies Go module dependencies area: documentation Documentation and markdown content area: infra Infrastructure component area: ledger Ledger component area: migrations SQL migrations area: pkg Reusable public packages area: scripts Build and tooling scripts area: tests Unit, integration and end-to-end tests area: tracer size/XL PR changes >= 1000 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants