fix(#1429): cascade through FK chain for part_integrity="cascade"#1468
fix(#1429): cascade through FK chain for part_integrity="cascade"#1468dimitri-yatsenko wants to merge 3 commits into
Conversation
Diagram.cascade(part_integrity="cascade") used to derive each Part's Master restriction by joining `master_ft.proj() & child_ft.proj()` on shared attribute names. This failed when the Part referenced its Master indirectly — through another Part with renamed FK columns (`.proj()` in the definition) or via a Part-of-Part chain that does not directly inherit Master's PK names. The intermediate Parts' restrictions were also skipped. Replace the proj-join shortcut with an upward walk of the actual FK graph from the Part to its Master, applying symmetric (upward) counterparts of the existing propagation rules at each edge. Key changes in src/datajoint/diagram.py: - New Diagram._apply_propagation_rule_upward — mirror of the existing forward propagation method. Same three rules (shared-PK copy, aliased reverse-rename, non-aliased projection) applied in the reverse direction (child → parent). - New Diagram._propagate_part_to_master — walks nx.shortest_path (Master → Part) and applies the upward rules along each real edge, transparently skipping the integer-named alias nodes that the graph inserts for aliased FKs. Restricts intermediate Parts too (the chain case from #1429 Case 2). Materializes the Master's restriction via to_arrays() so the subsequent forward cascade back down to Master's other Parts produces literal `WHERE ... IN (values)` clauses rather than self-referential subqueries (avoiding MySQL error 1093). - New Diagram._find_real_edge_props — looks up edge props for parent → child via the direct edge OR through an alias node. - _propagate_restrictions: seed-is-Part case. When the cascade starts at a Part (e.g. `Master.PartB.delete(part_integrity="cascade")`), the main loop's part_integrity block — nested inside the out_edges iteration — cannot fire because a leaf Part has no out-edges. Trigger the upward propagation explicitly for the seed before the main loop. - Diagram.cascade: expand nodes_to_show to include any node that the part_integrity propagation pulled in (the master and its descendants), so counts() and __iter__ report the full cascade subgraph. Tests in tests/integration/test_cascade_delete.py — three new mysql tests covering both #1429 cases plus an end-to-end delete. Full regression: 8 + 15 + 33 mysql tests pass. Slated for DataJoint 2.3.
MilagrosMarin
left a comment
There was a problem hiding this comment.
Reviewed the implementation in depth. Substantive bug is fixed correctly and the design is sound — observations below are quality nits, not correctness issues.
Verification ✅
Architectural fix is correct. The shift from "master_ft.proj() & child_ft.proj() shared-attribute join" → "walk the actual FK graph with backward propagation rules" correctly addresses both bug cases from #1429:
- Case 1 (renamed FK): Backward Rule 2 (
child_ft.proj(**{pk: fk for fk, pk in attr_map.items()})) reverses the renaming - Case 2 (Part-of-Part chain): Walking
nx.shortest_pathrestricts intermediate Parts along the way
Backward rules mirror forward rules exactly. Compared _apply_propagation_rule_upward (lines 607-661) against _apply_propagation_rule (lines 549-605):
| Forward Rule 1 | Backward Rule 1 | |
|---|---|---|
| Condition | not aliased and parent_attrs <= child_pk |
not aliased and child_attrs <= parent_pk |
| Action | Copy parent restriction to child | Copy child restriction to parent |
| Forward Rule 2 | Backward Rule 2 | |
|---|---|---|
| Condition | aliased |
aliased |
| Action | parent_ft.proj(**{fk: pk for fk, pk in attr_map.items()}) |
child_ft.proj(**{pk: fk for fk, pk in attr_map.items()}) |
| Forward Rule 3 | Backward Rule 3 | |
|---|---|---|
| Condition | not aliased, restriction ⊄ child PK |
not aliased, restriction ⊄ parent PK |
| Action | parent_ft.proj() |
child_ft.proj() |
Symmetry is precise. ✓
Materialization rationale is sound. to_arrays() converts the master's accumulated query restriction into a literal value tuple, avoiding MySQL error 1093 (self-referential subquery) on subsequent forward cascade. The empty-result case (len(master_pk_values) == 0 → restrictions[master_name] = [False]) is the right behavior for "include master with zero matches in counts/iter".
Seed-is-Part special case is justified. A leaf Part has no out-edges, so the main loop's part_integrity block (nested inside for target in out_edges(node)) cannot fire. The pre-loop trigger at lines 467-474 is the only path that catches (Master.PartB & key).delete(part_integrity="cascade") when PartB is a leaf.
visited_masters correctly deduplicates pre-loop and main-loop triggers.
Tests cover the bug cases:
test_cascade_part_of_part_no_master_reference— Case 2test_cascade_part_of_part_renamed_fk— Case 1test_cascade_part_of_part_actual_delete— end-to-end SQL execution (no 1093)
Observations worth flagging
1. Unused parameter propagated_edges in _propagate_part_to_master (line 663). Accepted but never referenced inside the function body. Either remove from the signature or use it to track upward edges (e.g., prevent double-application if the function is invoked twice with overlapping paths). Right now it's misleading dead weight.
2. PR description typo. The test-section line says:
Three new MySQL tests in
tests/integration/test_cascade_delete.py::TestLineageRefreshOnDecoration
But TestLineageRefreshOnDecoration is the class name from #1467 (the ~lineage PR). The tests added here aren't inside any class — they're top-level functions. Looks like a copy-paste from #1467's PR template.
3. PostgreSQL coverage gap. Tests use schema_by_backend, which is backend-parameterized in principle (see connection_by_backend/db_creds_by_backend). PR test plan says "All 3 new tests pass on MySQL" — only MySQL is claimed. The MySQL 1093 issue doesn't apply to PostgreSQL (which permits self-referential subqueries), but the upward propagation logic is supposed to work on both. Worth confirming the parameterization actually runs on PG, since the materialization choice is MySQL-motivated and PG might exercise different code paths.
4. Three-level Part chain not tested. Tests cover PartB → PartA → Master (2 hops). A PartC → PartB → PartA → Master chain would exercise the loop in _propagate_part_to_master for len(real_path) > 3 and confirm intermediate Parts are restricted at every hop, not just the first. Worth one test.
5. Multiple FK paths from Master to a Part. nx.shortest_path returns one path. If a Part has multiple FK chains to its Master (unusual but conceivable — e.g., Part references two different intermediate Parts), restrictions through the non-shortest paths are missed. The docstring should note this limitation (or explicitly check for graph-uniqueness).
6. Materialization memory cost. master_ft.proj().to_arrays() pulls all matching master PKs into Python memory. For a cascade preview / large delete, this could be substantial (bounded by the count of distinct master rows referenced by the matching parts — typically small, but not always). A docstring note would help users predict the cost surface.
7. Companion docs PR. PR description says a cascade-rules spec is forthcoming in datajoint-docs. The spec will be more reviewable once the impl is settled, so this PR doesn't need to wait — just noting the cross-reference for tracking.
Summary
The substantive bug is fixed and the design is sound. Items #1–#6 are quality observations, not correctness issues — most could be addressed in a follow-up or as small docstring/test additions before merge. The unused parameter (#1) and PR description typo (#2) are mechanical fixes worth catching.
- Drop unused `propagated_edges` parameter from `_propagate_part_to_master`
and its call sites. The parameter was vestigial after the design
switched from edge-blocking to materialization at the master.
- Document two limitations in the docstring:
- Single FK path: nx.shortest_path returns one path; non-shortest
paths are not applied.
- Memory cost of materialization: to_arrays() pulls matching master
PKs into Python memory.
- Add test_cascade_three_level_part_chain covering PartC → PartB →
PartA → Master. Confirms intermediate Parts are restricted at every
hop, not just the first.
All 36 mysql tests in cascade_delete + cascading_delete + dependencies
+ semantic_matching pass.
|
Thanks @MilagrosMarin — thorough review. Pushed 9094a64 addressing #1, #4, #5, #6.
Full mysql regression: 36 tests in cascade_delete + cascading_delete + dependencies + semantic_matching all pass on this commit. |
Summary
`Diagram.cascade(part_integrity="cascade")` used to derive each Part's Master restriction by joining `master_ft.proj() & child_ft.proj()` on shared attribute names. This failed when the Part referenced its Master indirectly:
Replace the proj-join shortcut with an upward walk of the actual FK graph from the Part to its Master, applying symmetric (upward) counterparts of the existing propagation rules at each edge. Closes #1429. Slated for DataJoint 2.3.
Approach
Tests
Three new MySQL tests in `tests/integration/test_cascade_delete.py::TestLineageRefreshOnDecoration`:
Regression: all 8 cascade_delete mysql tests pass, plus 15 in cascading_delete + dependencies, plus 33 in semantic_matching + erd. No regressions.
What's not in this PR
Test plan