Skip to content

Support terraform references in direct engine#5392

Open
denik wants to merge 46 commits into
mainfrom
schema-map1
Open

Support terraform references in direct engine#5392
denik wants to merge 46 commits into
mainfrom
schema-map1

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Jun 1, 2026

Changes

  • New package bundle/terraform_dabs_map that contains generator for a mapping between terraform and DABs schema as well as utilities to map from terraform and DABs resource references.
  • In direct engine, we now use this mapper to support terraform references. We also explicitly error when a reference is using terraform-only field that has no equivalent.
  • In terraform, we also use this mapper to warn about references that are not going to work on direct engine. When we encounter such reference, we also set a flag in telemetry.
  • The package also contains a function that maps DABs references to Terraform, that is not used currently but will be needed for new state migration implementation.

Why

Enables seamless and unattended migration to direct engine.

Tests

New acceptance and unit tests.,

@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:18 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:18 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:29 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:29 — with GitHub Actions Inactive
denik added 26 commits June 1, 2026 10:31
The script takes the DABs fields file and Terraform provider schema JSON,
applies key renames (plural→singular) matching the tfdyn converter logic,
and produces a bidirectional map showing which fields match, are renamed,
or exist only in one schema.

Co-authored-by: Isaac
…irst

Previously, dabs_only entries showed the expected tf_path which was
misleading (e.g., volumes.full_name appeared to match but doesn't
exist in the TF schema). Now dabs_only entries show ? for tf_path.

Also reorder columns to: status, dabs_path, tf_resource, tf_path.

Co-authored-by: Isaac
Previously every descendant of a renamed field (e.g., tasks→task) was
also marked "renamed". Now only the field where the rename occurs is
printed. Inner renames (e.g., tasks.*.libraries→task.*.library) are
still shown.

Co-authored-by: Isaac
…cle, grants)

These are DABs additions that don't exist in Terraform schema.
Suppressing them reduces noise in dabs_only output (782 → 632 entries).

Co-authored-by: Isaac
- Add bundle/terraform_dabs_map package that builds a bidirectional map
  between DABs resource fields and Terraform resource fields
- Add 'databricks bundle debug schema-map' command that prints the mapping
- Extend TF schema codegen to emit resource_schemas.go: a typed probe struct
  (ResourceSchemas) used to enumerate all TF resource struct types via
  reflection without a manual registry
- Add acceptance test that records command output

Co-authored-by: Isaac
- Add bundle/terraform_dabs_map/cmd/generate/main.go: generator that
  computes DABs↔Terraform field mapping at build time using a stemming
  algorithm (ies→y, s→"", git_→"") instead of hardcoded rename rules.
  Conflict detection warns when multiple DABs paths stem to the same TF path.

- Add bundle/terraform_dabs_map/generated.go: three precomputed maps
  (TerraformToDABsFieldMap, DABsOnlyFields, TerraformOnlyFields) plus
  MatchCounts, all keyed by DABs group name. Only records renames where the
  leaf field name itself differs; child paths of renamed parents count as
  matches (same leaf name, different ancestor).

- Remove bundle/terraform_dabs_map/map.go and renames.go: the runtime
  Build() function and hardcoded renameRules are replaced by the generator.

- Update cmd/bundle/debug/schema_map.go: print one summary line per
  resource instead of a full field listing.

- Add generate-schema-map task to Taskfile.yml.

- Update acceptance test output to match new summary format.

Co-authored-by: Isaac
…to test

- Add modified_status to dabsKnownFields (DABs-internal deployment
  tracking field with no Terraform concept) and provider_config to
  tfKnownFields (Terraform provider metadata with no DABs concept),
  removing noise from the dabs-only and tf-only maps.

- Replace bundle/terraform_dabs_map/cmd/generate/main.go with
  bundle/terraform_dabs_map/generate_test.go: TestGenerateSchemaMap
  recomputes generated.go and either overwrites it (with -update) or
  fails if the file is stale. Stats (matches/renames/dabs-only/tf-only
  per group) are printed via t.Logf when -update is passed.

- Add Groups var to generated.go listing the processed 2-level groups;
  the CLI command now iterates over Groups instead of GroupToTerraformName
  to avoid printing the 3-level groups (permissions, grants, secret_acls).

- Drop MatchCounts from generated.go; match counts are printed during
  generation only (via t.Logf -v) and not needed at runtime.

- Update Taskfile generate-schema-map to use the test-based approach.

Co-authored-by: Isaac
Replace the hand-rolled flag + assert.Equal with testdiff.OverwriteMode
and testdiff.AssertEqualTexts so failures show a unified diff and the
-update flag is consistent with the rest of the test suite.

Co-authored-by: Isaac
Groups was only needed to distinguish 2-level groups (with adapters)
from 3-level groups (permissions, grants, secret_acls) in
GroupToTerraformName. The three generated maps encode this already:
groups without adapters produce no entries in any map, so the union of
all map keys is exactly the set of processed 2-level groups.

Groups with all-zero entries (e.g. registered_models) disappear from
the CLI output — a line reading "0 renames, 0 dabs-only, 0 tf-only"
carries no information.

Co-authored-by: Isaac
…level suppression

- allStems: replace O(2^n) cartesian product with O(n+1) approach: try each
  segment independently, then try all-at-once as fallback for multi-renames
  like tasks.libraries → task.library.
- Add tfKnownSegments for any-level TF path suppression (vs. tfKnownFields
  which only suppresses at the top level). Add provider_config to it since
  it appears nested at many depths and is never a DABs concept.
- Add hasKnownSegment helper to check any segment in a path against a set.

Co-authored-by: Isaac
matchToTF processes one segment at a time: segmentVariants() is called on
a single field name at each level, and once the head segment is mapped the
tail is resolved recursively against the filtered sub-tree.

This is strictly more correct than the O(n+1) allStems approach. allStems
tried each segment independently and added an all-at-once fallback, but the
fallback applied all stems simultaneously — so tasks.libraries.maven.coordinates
would get stemmed to task.library.maven.coordinate, missing the TF field
task.library.maven.coordinates (where coordinates is spelled the same on
both sides). matchToTF finds this correctly because it stops stemming a
segment once a prefix match is found, rather than stemming all segments
together.

Co-authored-by: Isaac
segmentVariants returned up to 4 forms to cover combinations of git_-stripping
and ies/s suffixing. In practice those intermediate forms (just-stripped or
just-singularized) are never the correct TF name; the fully-stemmed form always
suffices. stem() applies both transformations in sequence and returns a single
string. matchToTF tries the original and stem(original) — at most two candidates
per segment, no slice allocation needed in the common case.

Co-authored-by: Isaac
Replace the flat dotted-path maps with nested Go types:

  FieldSet map[string]FieldSet
    Each key is a single segment; empty child marks a leaf.
    Used for DABsOnlyFields and TerraformOnlyFields.

  RenameTree map[string]RenameNode
  RenameNode struct { DABs string; Children RenameTree }
    Navigate with TF segments; DABs is the corresponding DABs name when it
    differs. Both fields may be set (e.g. TF 'task'→DABs 'tasks' while
    'task.library'→'tasks.libraries' lives in Children).

Both types carry a Len() method so schema_map.go can count leaves without
knowing the nesting depth.

Co-authored-by: Isaac
postgres_branches/catalogs/endpoints/projects all expose TF fields that
are nested under a "spec" object in Terraform but flat in DABs.
The generator now auto-detects such wrappers (Step 3): a TF segment is
marked Unwrap when ALL of its sub-fields are accounted for by unmatched
DABs fields, which rejects output-only wrappers like "status" that carry
extra computed fields.

RenameNode gains an Unwrap bool field; TerraformPathToDABs skips the
segment without emitting it to the DABs path when Unwrap is true.

Co-authored-by: Isaac
After matching against InputConfigType, remaining TF-only fields are
compared against RemoteType (the shape returned by DoRead in the direct
engine). Fields present in both TF schema and RemoteType but absent from
user-facing InputConfigType are classified as computed: server-generated
outputs the direct engine can read, but the user doesn't configure.

This cleans up TerraformOnlyFields significantly for postgres resources —
all are now in TerraformComputedFields (timestamps, status, uid, etc.),
leaving only "purge_on_delete" as a genuinely TF-specific field.
Other resources (pipelines, experiments, schemas, sql_warehouses, etc.)
also gain non-zero computed counts.

Jobs remain at 257 tf-only / 0 computed because those fields are
structural mismatches (legacy TF top-level task syntax vs DABs tasks[])
rather than RemoteType coverage gaps.

Co-authored-by: Isaac
… filter

The TerraformComputedFields variable had no consumer beyond the debug
command. The useful work was suppressing false positives from
TerraformOnlyFields (server-generated outputs that are in RemoteType
but not in user-facing InputConfigType). Keep the RemoteType walk as a
silent filter inside buildGroup; drop the generated variable entirely.

Co-authored-by: Isaac
Generate DABsToTerraformRenameMap (inverse of TerraformToDABsFieldMap renames)
and DABsToTerraformWrappers (TF prefix to prepend for groups with Unwrap) via
a new invertRenames helper in the generator.

Add DABsPathToTerraform which mirrors TerraformPathToDABs: renames segments
using DABsToTerraformRenameMap and prepends the spec wrapper for postgres
resources via DABsToTerraformWrappers.

Extend TestTerraformPathToDABs with a roundtrip bool field — when true the
test also verifies DABsPathToTerraform(output) == input. All mapped paths
roundtrip correctly; TF-computed paths (status.*, timestamps) are marked
roundtrip:false since they pass through TF→DABs unchanged but would get the
spec wrapper incorrectly added on the way back.

Add TestDABsPathToTerraform covering renames, wrapper prepend, unknown
segments, array indices, and unknown groups.

Co-authored-by: Isaac
…de-only fields

- Add FieldSet.Contains to check if a path is covered by the set
- TerraformPathToDABs returns error for paths in TerraformOnlyFields
- DABsPathToTerraform returns error for paths in DABsOnlyFields
- Test table: named fields, one field per line, terrPath/dabsPath names,
  noRoundtrip instead of roundtrip, expectErr for one-side-only cases
- Drop sortedKeys helper; use slices.Sorted(maps.Keys(...)) directly

Co-authored-by: Isaac
Test that map-typed TF-only fields (spark_conf, base_parameters) reject
any runtime key via the '*' wildcard in FieldSet.Contains.

Co-authored-by: Isaac
Terraform represents single nested blocks as lists and requires [0]
indexing to access their fields (e.g. python_wheel_task[0].entry_point).
DABs models these as plain structs.

Make getValue() and validateNodeSlice() ignore [0] on struct values so
TF-style paths work transparently against DABs Go structs.

Co-authored-by: Isaac
The Python script and its TSV output were scaffolding used to discover
rename rules and one-sided fields. All useful data is now in the Go
version (bundle/terraform_dabs_map/generated.go), which is more
complete and serves as the runtime source of truth.

Co-authored-by: Isaac
denik added 9 commits June 1, 2026 10:31
…mode

Add TFOnlyReferences mutator to the initialize phase. It walks the full
config looking for ${resources.*} references and, for each field path,
checks TerraformOnlyFields to see if the field has no DABs equivalent.

In direct mode the reference cannot be resolved at deploy time, so the
mutator returns an error early (during validate / plan / deploy).

In Terraform mode the reference is handled by Terraform itself, but it
will break if the bundle is later migrated to the direct engine, so a
warning is emitted to alert the author.

Tests:
- Four unit tests covering error/warning/no-op/renamed-field cases.
- tf_path_only_error acceptance test updated: validate now emits
  per-engine diagnostics (2>&1 added to capture stderr alongside
  the bundle summary), and plan/deploy also surface the new early
  error/warning before reaching their own resolution logic.

Co-authored-by: Isaac
…pelines into tf_path_renames

All three tests exercise the same pattern: a cross-resource ${resources.*}
reference that uses a TF field name (or TF list-block syntax) that the
direct engine translates via TerraformPathToDABs. Merging them removes
duplication and exercises all three rename types in a single deploy:

- git_source[0].branch  → git_source.git_branch  (rename + [0] on struct)
- task[0].library[0].whl → tasks[0].libraries[0].whl  (two renames)
- cluster[0].label      → clusters[0].label  (pipeline rename)

Co-authored-by: Isaac
The message previously read:
  "resources.jobs.src.always_running": always_running: Terraform-only field; ...

The field name appeared twice: once in the quoted reference, and again as
the p[3:] prefix.  Drop the redundant prefix.

Co-authored-by: Isaac
Groups with a 'spec' structural wrapper showed '0 renames, 0 dabs-only,
0 tf-only' without any indication that spec-unwrapping was applied.
Add the unwrap count so the summary is complete.

Co-authored-by: Isaac
TestTerraformPathToDABs already checks the roundtrip for every
non-error case.  Replace TestDABsPathToTerraform with a smaller
TestDABsPathToTerraformErrors covering only the DABs-only error
cases that the roundtrip never reaches.

Co-authored-by: Isaac
Dead code; nothing in the codebase calls them.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:40 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:40 — with GitHub Actions Inactive
@denik denik changed the title WIP Support terraform references in direct engine Support terraform references in direct engine Jun 1, 2026
@denik denik marked this pull request as ready for review June 1, 2026 08:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Approval status: pending

/acceptance/bundle/ - needs approval

35 files changed
Suggested: @pietern
Also eligible: @shreyas-goenka, @andrewnester, @janniklasrose, @anton-107, @lennartkats-db

/bundle/ - needs approval

13 files changed
Suggested: @pietern
Also eligible: @shreyas-goenka, @andrewnester, @janniklasrose, @anton-107, @lennartkats-db

General files (require maintainer)

4 files changed
Based on git history:

  • @pietern -- recent work in bundle/config/validate/, bundle/direct/, bundle/internal/tf/schema/

Any maintainer (@andrewnester, @anton-107, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:55 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:55 — with GitHub Actions Inactive
Comment thread bundle/terraform_dabs_map/generated.go
Comment thread bundle/terraform_dabs_map/generated.go
Comment thread bundle/terraform_dabs_map/generated.go
@denik denik temporarily deployed to test-trigger-is June 1, 2026 09:21 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 09:21 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

Commit: 25bf471

Run: 26746319106

@denik denik requested a review from andrewnester June 1, 2026 13:42
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:49 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:49 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants