Skip to content

fix(#1454): refresh ~lineage on every @schema decoration#1467

Open
dimitri-yatsenko wants to merge 2 commits into
masterfrom
fix/1454-stale-lineage-refresh
Open

fix(#1454): refresh ~lineage on every @schema decoration#1467
dimitri-yatsenko wants to merge 2 commits into
masterfrom
fix/1454-stale-lineage-refresh

Conversation

@dimitri-yatsenko

@dimitri-yatsenko dimitri-yatsenko commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Stale rows in the `~lineage` table caused spurious "different lineages" errors during `populate()` on FK-inherited primary keys — surfaced during the Built-On demo prep on Lakebase/PostgreSQL (May 18–19, 2026). The load-bearing failure mode was lineage missing entirely (the demo error: `None vs ...`), not stale-but-non-None values.

Closes #1454. Slated for DataJoint 2.3.

Approach: in-memory check, refresh only when symptomatic

Healthy schemas should not pay extra DB queries on every decoration. When a heading is constructed for an already-declared table, its lineage values are loaded from `~lineage` in a single SELECT — already paid for by normal decoration. Scanning those in-memory values for PK attributes with `lineage=None` costs nothing extra.

  • Healthy schema: zero additional queries on `@schema(MyTable)`. The in-memory check confirms all PK lineages are present; no refresh runs.
  • Schema with missing rows (the bug's primary symptom): in-memory check fires, refresh writes correct rows.
  • Schema with stale-but-non-None rows (DJ version skew): NOT auto-detected. Users hit the improved error message on first join ("...Run `schema.rebuild_lineage()`...") and run the explicit repair.

What's in the PR

Change File
`Table._refresh_lineage(context)` — parses current definition via the existing `declare()` machinery (in-memory only; no DDL execution), then calls `_populate_lineage` to delete-then-insert. Errors logged + swallowed. `src/datajoint/table.py`
`@schema` decoration guards `_refresh_lineage` on the in-memory check: any PK attribute with `lineage=None` triggers it. Healthy schemas skip entirely. `src/datajoint/schemas.py`
Tailored error message in `assert_join_compatibility` when one side's lineage is `None` — points the user at `rebuild_lineage()` instead of the generic message. Original message stands when both lineages are present but differ (the legitimate semantic-mismatch case). `src/datajoint/condition.py`

Tests

Five tests in `tests/integration/test_semantic_matching.py::TestLineageRefreshOnDecoration`:

  • `test_redecorate_restores_missing_lineage` — primary auto-heal path
  • `test_redecorate_heals_partial_lineage` — mixed state (some stale, some missing) → triggers on missing rows, fixes both
  • `test_redecorate_skips_when_lineage_healthy` — intercept `~lineage` writes; zero DELETE/INSERT on healthy decoration
  • `test_stale_non_none_lineage_not_auto_refreshed` — documents the limitation; manual `rebuild_lineage` clears it
  • `test_missing_lineage_error_points_to_rebuild` — verifies the new error wording

All 26 tests in `test_semantic_matching.py` pass. Regression set (`test_declare.py`, `test_dependencies.py`, `test_autopopulate.py`) — 40 passed, 2 skipped, no regressions.

What's not covered by this PR

  • Stale-but-non-None auto-heal. Out of scope; users with this case run `schema.rebuild_lineage()` or `dj.migrate.rebuild_lineage(schema)` explicitly. The improved error message points them there.
  • Per-row schema-version tagging (the issue's "long term" direction): deferred.
  • Companion docs (concept + reference page updates): datajoint-docs #181 — in flight.

Test plan

  • New regression tests pass
  • Existing semantic-matching suite still passes (26 tests)
  • Existing declare/dependencies/autopopulate suites unaffected (40 passed)
  • CI green on this PR
  • Manual confirmation on a pipeline with the original demo failure shape

Stale rows in the ~lineage table caused spurious "different lineages"
errors during populate() on FK-inherited primary keys. The load-bearing
failure mode was lineage missing entirely (the demo failure: None vs ...),
not stale-but-non-None values.

Approach: detect the failure symptom in memory at @Schema decoration time.
When the heading is constructed for an already-declared table, its lineage
values are loaded from ~lineage in a single SELECT. Scanning those in-memory
values for PK attributes with lineage=None costs nothing extra. Healthy
schemas pay zero additional DB queries on re-decoration; the refresh only
fires when the symptom is detectable in memory.

Changes:

1. Table._refresh_lineage(context) — parses current definition via the
   existing declare() machinery (in-memory parse only; no DDL execution),
   then calls _populate_lineage() to delete-then-insert the table's rows.
   Errors logged and swallowed so a stale row is preferable to a failed
   schema activation.

2. schemas.py:_decorate_table guards the refresh on the in-memory check:
   only when any PK attribute's heading lineage is None. Healthy schemas
   skip the refresh entirely; missing-row schemas auto-heal.

3. Improved error message in condition.assert_join_compatibility: when one
   side's lineage is None, surface a tailored hint pointing at
   schema.rebuild_lineage() instead of the generic "different lineages"
   message. The original message stands when both lineages are present
   but differ.

Documented limitation: stale-but-non-None entries (e.g. DJ version skew
that wrote lineage in a different string format) are NOT auto-detected.
The tailored error message + dj.migrate.rebuild_lineage(schema) cover
that case as an explicit repair step.

Tests in tests/integration/test_semantic_matching.py::TestLineageRefreshOnDecoration:

- test_redecorate_restores_missing_lineage — primary auto-heal path
- test_redecorate_heals_partial_lineage — mixed state (some stale, some
  missing) triggers on the missing rows and fixes both
- test_redecorate_skips_when_lineage_healthy — intercept ~lineage writes
  and verify zero DELETE/INSERT on healthy decoration
- test_stale_non_none_lineage_not_auto_refreshed — documents the
  limitation; manual rebuild_lineage fixes it
- test_missing_lineage_error_points_to_rebuild — verifies the new error

Slated for DataJoint 2.3.
@dimitri-yatsenko dimitri-yatsenko force-pushed the fix/1454-stale-lineage-refresh branch from 86f23f2 to 30ed965 Compare June 10, 2026 21:59

@MilagrosMarin MilagrosMarin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elegant scoping. The in-memory check against the heading's already-loaded lineage values is exactly the right symptom test — zero cost on healthy schemas, auto-heal triggers only when the bug is in memory. The five integration tests cover the contract precisely:

test_redecorate_restores_missing_lineage — primary auto-heal path
test_redecorate_heals_partial_lineage — mixed state triggers on missing rows, fixes both
test_redecorate_skips_when_lineage_healthy — the load-bearing zero-DB-cost assertion (intercepts query() to count writes)
test_stale_non_none_lineage_not_auto_refreshed — documented limitation, surfaced via the improved error message
✅ Production-mode suppression via the create_tables=True guard in the elif branch

The improved assert_join_compatibility error message is well-targeted — distinguishes missing-lineage (auto-heal candidate) from genuinely-different-lineage (semantic mismatch).

One small observation: the error message wording says "stale ~lineage entry" but the case that fires here is missing, not stale. Both lead to the same fix (rebuild_lineage()), so it's forgivably loose — tightening to "missing or stale" would be a one-character improvement.

Approving — clean implementation of #1454.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale ~lineage entries cause spurious semantic-check failures; redeclare should overwrite

2 participants