[Experimental] repeat and %rowIndex implementation (staged SPEC_hybrid emitter)#7
Open
piotrszul wants to merge 19 commits into
Open
[Experimental] repeat and %rowIndex implementation (staged SPEC_hybrid emitter)#7piotrszul wants to merge 19 commits into
piotrszul wants to merge 19 commits into
Conversation
Adds a specialized skill for SQL on FHIR v2 specification guidance, with modular reference files covering ViewDefinition structure, FHIRPath subset, operations, and comprehensive examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the analysis of implementing the SQL on FHIR v2 repeat element for recursive traversal of nested FHIR structures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds two failing test cases (repeat inside forEach, repeat inside repeat) that surface the scope-leak limitations of the hoisted JSON CTE strategy, and documents Solution G — inlining the recursive CTE inside the value expression — which composes naturally inside lambdas and resolves those limitations without template changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Add Solution F: chained CTE with baked-in context for repeat Document a new repeat strategy that bakes non-repeat columns and forEach context into the recursive CTE anchor, eliminating the correlated subquery and fixing forEach/repeat nesting issues from Solution B. Verified working in DuckDB with unnest and list_transform in the recursive member. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The old `duckdb` package has no pre-built binaries for Node 25+ and was not released for DuckDB v1.5+. Migrates to `@duckdb/node-api` (Node Neo) throughout src and tests, converting callbacks to async/await and using getRowObjectsJS() for plain JS type coercion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The runtime was migrated to @duckdb/node-api in 0175209 but the benchmark script still imported the legacy duckdb package (pinned at 1.4.1, where nested correlated WITH RECURSIVE subqueries are buggy). Switch to the same client the rest of the project uses so benchmarks reflect the production engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lower a ViewDefinition into staged CTEs per SPEC_hybrid — carry-forward chain stages with keyed recombination only at independent fan-out forks — while reusing the existing typed FHIRPath engine for every leaf. - ddb-sql-builder: astToSql gains an optional outer root-var (default `el`) so a staged scope element can be aliased `node` without colliding with internal list_transform `el` lambdas; struct emitter unchanged. - staged-sql-builder (new): tree-walk + CHAIN/FORK classification, lazy integer keys (rid + WITH ORDINALITY ordinals at nested forks), forEach/ forEachOrNull (LEFT JOIN UNNEST, no padding), unionAll, repeat rejection. - query-builder: backend: 'staged' | 'struct' selector; `AS result` folded into the expanded transform variable (literal dropped from all struct templates: ndjson/csv/parquet/dbt_model/explore + test-util). - tests: spec.staged.test.js runs the official reference suites unmodified through the staged backend (repeat excluded) + a repeat-rejection test; spec-tests/staged.json adds fork coverage (nested composite key, root cross product, forEachOrNull-in-fork) passing on both backends. Staged backend: all non-repeat official suites pass (96 tests green). Full suite: 290 pass / 15 fail (the 15 are repeat on the struct backend, pre-existing baseline — no regression). OpenSpec change add-staged-sql-emitter archived; capability spec synced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the always-on blocking row_number() OVER () window at root forks in the staged (SPEC_hybrid) emitter with the natural resource key (getResourceKey()/id) as the default recombination key, plus an opt-in uuid() mode for inputs where resource-id uniqueness can't be assumed. - staged-sql-builder: rootKey "natural"|"uuid" option; natural projects the resource key AS rid, uuid projects uuid() AS rid and signals srcMaterialized; lazily exposes resourceKeyAst so the typed read schema includes id even when the view doesn't project it. - query-builder: thread rootKey through buildQuery/templateToQuery; fold resourceKeyAst into the schema paths; add fq_staged_src_materialized. - cli: surface --backend and --root-key. - templates (test-util, gen-staged): emit WITH src AS MATERIALIZED in uuid mode. - tests: run the staged spec harness in both modes; new staged-rootkey test proves multiset identity vs the struct oracle (EXCEPT ALL = 0 both directions) and the no-id root-fork case. Archives the root-fork-resource-key OpenSpec change. Tasks 4.3/5.1 (benchmark regen, SPEC_hybrid docs) deferred — those artifacts are not present in this worktree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The staged backend previously rejected `repeat` and fell back to the struct emitter. `repeat` is the one unbounded-depth directive, so it cannot live on the typed substrate: lower it by oscillating the binding between typed and JSON — the recursive descent runs in JSON via a WITH RECURSIVE CTE (the fhir_list fold), while every column/forEach/where is still evaluated typed against a per-scope from_json bridge element. - JSON descent: WITH RECURSIVE CTE; seed = typed navigation of the listed paths off the enclosing element, recursive leg = fhir_list JSON fold (single step, multi-step flatten, list_concat across paths). New fhir_list macro. - Typed bridge: from_json(node, <structure>) AS el (lenient — NULL-fills absent FHIR fields, keeps nested-repeat seeds raw as JSON[]); factored into shared fq_cast_* macros. - One schema rule: parseVd surfaces a repeat seed childless -> JSON[]; pathsToJsonStruct renders the from_json structure off the same tree; forceJson makes "repeat wins" over a sibling forEach (shared field). - Chain/fork/union: lone repeat = chain; repeat among siblings = fork (JOIN USING the resource/composite key); repeat inside unionAll unions keyed branches; shared-field forEach reads via the bridge (json mode). - WITH RECURSIVE lead threaded through both templates and gen-staged. All 15 official repeat.json cases pass on the staged backend in both rootKey modes, plus a new typed-bridge choice-type test. Unbounded depth verified. Spec synced; change archived. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A repeat scope binds its element via the from_json typed bridge aliased
`AS el` (ref:"el"). The leaf compiler hardcoded the list_transform /
list_filter lambda parameter as `el` too, so a column *directly in a
repeat scope* that navigates an array compiled to
(el.answer).list_transform(el -> el.valueString)
where the lambda `el` collides with the bridge column `el` — DuckDB
resolves `el.valueString` against the bridge struct (keys linkId/answer),
raising "Binder Error: Could not find key valuestring in struct". The
official repeat.json suite never hit it (its repeat-scope columns are
scalar, or reach arrays only through a forEach, which uses ref:"node").
Give every list_transform/list_filter lambda the reserved variable `__el`
(constant L in ddb-sql-builder.js), which no scope element ref can ever
be, so a lambda parameter can never collide with the element it is
applied to. The lambda bodies are compiled with rootVar=L to match, and
the inLambda element prefixes in _splitPath/_forEach now use rootVar
(correct for both backends; previously hardcoded "el" worked only because
the struct backend always uses "el" inside lambdas).
Regression test added in tests/staged-repeat.test.js (a collection column
navigating an array directly inside a repeat scope). Full suite: the only
remaining failures are the pre-existing struct-backend repeat.json cases.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lower `repeat` on the struct backend so the default emitter — the oracle the staged emitter is validated against — passes all 15 official repeat.json cases. A repeat becomes a forEach over a bounded list_reduce accumulator: the JSON frontier is descended via the fhir_list fold, each frontier is bridged to typed elements with an inline from_json, and the body is the existing two-phase machinery (re-rooted at the repeat element type). The accumulator computes the descent once per step (frontier in `cur`, `acc` collects the previous frontier), accumulates truncated typed elements, and projects the body once after the reduce. Recursion depth is bounded and configurable (--repeat-depth, default 10). - src/repeat-lowering.js: shared helpers (assertSimplePath, jsonFold, typedSeed, repeatStructure, childElemOf) extracted from the staged emitter, plus the struct reduceSource builder. - src/view-parser.js: structRepeat mode emits a _repeat fan-out (and _jsoneach for the repeat-beside-forEach shared-field case) instead of the _rseed shim; validateVd rejects repeat+forEach. - src/ddb-sql-builder.js: _repeat/_jsoneach astToSql cases; repeat table type cross-joins; nav outputType carries schemaPath. - src/query-builder.js: orchestrates the schema-aware repeat context, forces repeat seeds to JSON[], threads repeatDepth. - src/cli.js: --repeat-depth option. - tests/struct-repeat.test.js: struct/staged equivalence on repeat.json, choice-type ofType() and lambda-collision guards, bounded-depth coverage. - openspec: new struct-sql-emitter capability spec; change archived. Known limitation (tracked in #5): nested repeat builds a large per-resource intermediate that spills to disk at scale; staged WITH RECURSIVE is the streaming option. scripts/gen-struct.js generates struct SQL for benchmarking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the SQL-on-FHIR %rowIndex environment variable on the struct backend: the 0-based position within the nearest enclosing iteration (forEach/forEachOrNull/repeat), or 0 at the resource root and in non-iterating unionAll branches. - fhirpath-parser: recognize %rowIndex as a contextual segment rather than routing it through the --var lookup (no longer throws "not defined"). - ddb-sql-builder: thread a rowIndexSql expression; real forEach/repeat emit an indexed list_transform lambda binding ((__idx-1)::INTEGER); source-side ifnull2 padding for forEachOrNull so the null row reports index 0; new _project directive for column projections that pass the enclosing index through instead of opening a new scope. - view-parser: emit _project (not _forEach) for projection wrappers (root, select, unionAll branch). - tests: exclude row_index from the staged suite (staged parity is a follow-up); update view parser-output assertions for _project. 8/9 row_index sub-tests pass. The repeat sub-case is left failing: the struct repeat lowering accumulates breadth-first while the spec pins depth-first pre-order numbering (documented as a Non-Goal). Change archived under openspec/changes/archive/2026-06-05-add-rowindex-struct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the staged backend to parity with the struct backend on the SQL-on-FHIR %rowIndex environment variable (the 0-based position within the nearest enclosing iteration). The shared FHIRPath front end already recognizes %rowIndex as a contextual segment, and astToSql already takes a rowIndexSql parameter; the staged compiler just never supplied it, so %rowIndex silently compiled to "0" everywhere (hence the suite exclusion). The staged emitter realizes forEach as UNNEST(...) WITH ORDINALITY, which already yields a 1-based ordinal, so %rowIndex is (ord - 1)::INTEGER. - staged-sql-builder: thread rowIndexSql on the element descriptor (rootElem -> "0"); compilePath/compileColumn/arrayize pass it to astToSql. Emit the iteration ordinal when the scope forks OR uses %rowIndex, adding it to the recombination childKey only for forks (the index-only case reads it in the immediate stage). forEachOrNull binds (coalesce(ord,1)-1) so the null row reports 0. unionAll: forEach branches get their own ordinal (independent numbering); columns-only branches inherit the enclosing scope's index. scopeUsesRowIndex detects usage by parsing for the rowIndex segment. - tests: un-exclude row_index.json from the staged suite. - specs: add the %rowIndex requirement to the staged-sql-emitter capability. 8/9 row_index sub-tests pass on staged. The repeat sub-case is left failing (WITH RECURSIVE descends breadth-first; the spec pins depth-first pre-order), documented as a Non-Goal and a follow-up (add-rowindex-repeat-preorder), exactly as the struct suite already carries it. Change archived under openspec/changes/archive/2026-06-06-add-rowindex-staged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the deferred follow-up (`add-rowindex-repeat-preorder`) from the `add-rowindex-staged` change: `%rowIndex` over a `repeat` now evaluates to the SQL-on-FHIR depth-first pre-order position (`linkId 1, 1.1, 1.2, 2 -> 0, 1, 2, 3`), restarting per the repeat's enclosing-scope instance. Previously the staged `repeat` fan-out never threaded a row-index expression, so `%rowIndex` over a repeat compiled to the default `0`, and the `row_index.json` repeat sub-test was left failing on the staged suite (mirroring struct). Mechanism (SPEC_hybrid section 11, and the alternative flagged in the add-rowindex-staged design D2): - staged-sql-builder: when a repeat body uses `%rowIndex`, carry a materialized integer descent `path` (`[ord]::BIGINT[]` seed leg, `list_append(path, ord)` recursive leg, each ord from an added WITH ORDINALITY) through the WITH RECURSIVE descent, then assign `(row_number() OVER (PARTITION BY <key> ORDER BY path) - 1)::INTEGER` at the typed bridge stage and bind it as the body's rowIndexSql. Ordering the int[] path element-wise yields pre-order; the index is a scalar assigned before any fork recombination, so it survives joins. - Force the partition key: generalize the lazy fork look-ahead so the resource key (rid) and enclosing forEach ordinals are minted on the spine down to a `%rowIndex` repeat even in fork-free views (subtreeHasRepeatUsingRowIndex alongside subtreeHasFork). A repeat nested in a repeat chains its partition by appending the outer repeat's own index ((rid, outer_rn)) -- never PARTITION BY a list. - emitJsonEach now binds `%rowIndex` to its ordinal, and the fork jsonMode forEach branch passes its ordinal through (the shared-field cases). - Fully gated on usage: a repeat whose body does not reference `%rowIndex` emits no ordinality, no path, no window, and no forced rid -- byte-for-byte unchanged. Tests: - tests/spec-tests/row_index_repeat.json (new, skip:true so the generic auto-discovery suites ignore it) + tests/row-index-repeat.test.js: a staged-only runner over five extended cases the official row_index.json does not reach -- fork-free multi-resource, repeat-under-forEach, repeat-in-repeat, repeat-in-unionAll, repeat-at-a-fork -- in both natural and uuid root-key modes. - spec.staged.test.js: the official row_index.json repeat sub-test now passes (comment updated; EXCLUDE was already null). - struct backend unchanged; its row_index.json repeat red remains a separate documented follow-up. - specs/staged-sql-emitter: the `%rowIndex` requirement now covers repeat pre-order; the "out of scope" carve-out removed. Change archived under openspec/changes/archive/2026-06-06-add-rowindex-repeat-preorder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Vendor the staged (SPEC_hybrid / "Spec D") emitter specification into the repo as `docs/SPEC_hybrid.md` so it is browsable on the PR and travels with the code it specifies. Removes the superseded `docs/repeat-implementation-investigation.md` exploratory note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bring the built-in template set to parity for the staged (SPEC_hybrid) backend. Previously every shipped @template only referenced struct vars, so --backend staged --template @csv could not emit a real output query. - cli.js: resolve @name templates per --backend, preferring templates/staged/<name>.sql and falling back to templates/<name>.sql - templates/staged/{csv,parquet,ndjson,explore,dbt_model}.sql: new parallel set wrapping the staged CTE skeleton in each output stage, emitting both base and view-specific macros; explore limits source rows inside the src CTE to match struct - scripts/gen-staged.js: removed (superseded by the shipped templates) - README.md: document --backend/--root-key/--repeat-depth, the Backends section, backend-aware resolution, and the fq_staged_* variables - openspec: add the templates capability spec; archive the change Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the staged (SPEC_hybrid) SQL emitter to full
%rowIndexparity, including overrepeat— the deferred follow-up (add-rowindex-repeat-preorder) from the earlieradd-rowindex-stagedchange.%rowIndexover arepeatnow evaluates to the SQL-on-FHIR depth-first pre-order position (linkId 1, 1.1, 1.2, 2 → 0, 1, 2, 3), restarting per the repeat's enclosing-scope instance. Previously the stagedrepeatfan-out never threaded a row-index expression, so%rowIndexover a repeat compiled to the default0and therow_index.jsonrepeat sub-test was left failing on the staged suite.What changed
Mechanism follows
docs/SPEC_hybrid.md§11 and the alternative flagged in theadd-rowindex-stageddesign D2:%rowIndex, carry a materialized integer descentpath([ord]::BIGINT[]seed leg,list_append(path, ord)recursive leg, eachordfrom an addedWITH ORDINALITY) through theWITH RECURSIVEdescent, then assign(row_number() OVER (PARTITION BY <key> ORDER BY path) - 1)::INTEGERat the typed bridge stage and bind it as the body's row-index. Ordering theint[]path element-wise yields pre-order; the index is a scalar assigned before any fork recombination, so it survives joins.subtreeHasRepeatUsingRowIndexalongsidesubtreeHasFork) so the resource key (rid) and enclosingforEachordinals are minted on the spine down to a%rowIndexrepeat even in fork-free views. A repeat nested in a repeat chains its partition by appending the outer repeat's own index ((rid, outer_rn)) — neverPARTITION BYa list.emitJsonEachnow binds%rowIndexto its ordinal, and the forkjsonModeforEachbranch passes its ordinal through.%rowIndexemits no ordinality, no path, no window, and no forcedrid— byte-for-byte unchanged.Tests
tests/spec-tests/row_index_repeat.json(skip:trueso generic auto-discovery ignores it) +tests/row-index-repeat.test.js: a staged-only runner over five extended cases the officialrow_index.jsondoesn't reach — fork-free multi-resource, repeat-under-forEach, repeat-in-repeat, repeat-in-unionAll, repeat-at-a-fork — in bothnaturalanduuidroot-key modes. 10/10 pass.row_index.jsonrepeat sub-test now passes on the staged suite (was the documented red).structbackend'srow_index.jsonrepeat sub-test, which is pre-existing and out of scope (struct unchanged; verified it fails identically with this change stashed). Struct pre-order remains a separate documented follow-up.Spec
docs/SPEC_hybrid.md: the staged ("Spec D") emitter specification, vendored into the repo (replacing the earlierdocs/repeat-implementation-investigation.mdexploratory note).openspec/specs/staged-sql-emitter/spec.md: the%rowIndexrequirement now coversrepeatpre-order (materialized integer path + path-ordered window, forced enclosing-scope partition, repeat-in-repeat chaining); the prior "out of scope" carve-out is removed. Change archived underopenspec/changes/archive/2026-06-06-add-rowindex-repeat-preorder.🤖 Generated with Claude Code