Skip to content

expressions: tree-general index deduction (Phase E) — inner-node general products#563

Open
evaleev wants to merge 3 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/general-product-tree-deduction
Open

expressions: tree-general index deduction (Phase E) — inner-node general products#563
evaleev wants to merge 3 commits into
evaleev/feature/general-product-exprfrom
evaleev/feature/general-product-tree-deduction

Conversation

@evaleev

@evaleev evaleev commented Jun 12, 2026

Copy link
Copy Markdown
Member

Stacked on #562. Completes the general-product story for arbitrary expression trees: general products (fused + contracted + free indices) no longer need to sit at the root of their own assignment.

What

  • Top-down index-set deduction (MultEngine::init_indices(target)): an index shared by a product's two children is fused iff the node's target carries it, contracted at the node otherwise; an index neither the sibling nor the target carries is consumed entirely within the child subtree and is dropped from its demand (pure intersection). The up-pass is the new available_indices() (per-subtree leaf-annotation union, valid before init); each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products). Expressions consumed without a target (reductions) retain the bottom-up contraction convention and remain unsupported for inner general products.
  • Streaming result re-permute for general products (replaces the interleaved-target gate): a target differing from the canonical result layout cannot be folded into the batched tile op (BatchedContractReduce must stay perm-free), so the product evaluates canonically (perm-free batched Summa over a slab-replicated pmap) and a UnaryEvalImpl wrapper re-permutes to the consumer's layout. Honors TA's implicit-permute contract: when the consumer fuses the permutation into its transposed GEMM (implicit_permute_outer), only tile ordinals/trange are remapped and contents stay canonical. This also lifts the former root-level interleaved-target limitation.

With this, the THC reconstruction evaluates in ONE expression:

g("p,q,r,s") = x("p,r1") * x("q,r1") * z("r1,r2") * x("r,r2") * x("s,r2");

(left-deep tree; r1 is fused where the two x factors meet, contracted where z joins, and dropped from demands above), verified against the explicit-intermediates staging.

Tests

  • New: THC one-expression vs explicit intermediates; minimal depth-2 (general feeding contraction, exercises the implicit-permute path); root-level non-canonical target (exercises the content-permuting wrapper); scratch deduction unit checks.
  • The former expression_general_product_inner_node_gated throw test is superseded (the expression now evaluates).
  • Full regression: general_product, einsum_*, expressions{,_sparse} (modulo the two pre-existing assign_subblock_block_base1 failures), tot_contraction/permutations — green.
  • mpqc c6h14/cc-pVDZ PNO-CCSD energy unchanged (3e-11, run-to-run noise level).

Out of scope (follow-ups)

  • ScalMultEngine (scaled products) keeps depth-1 semantics for inner general products.
  • ToT inner-index result permutations remain gated.
  • Mixed T*ToT in arbitrary trees (Phase F) — to be scoped separately.

evaleev added 3 commits June 11, 2026 21:03
…t target (Phase E)

An index shared by the two children of a product is fused iff the node's
target carries it, contracted otherwise; an index neither the sibling nor
the target carries is consumed within the child subtree and not demanded
of it. available_indices() (per-subtree leaf-annotation union, valid
before init) supplies the up-pass; each child dictates the ORDER of its
demand via preferred_layout() (canonical (fused, left-free, right-free)
for products, pass-through elsewhere). Expressions consumed without a
target (reductions) retain the bottom-up contraction convention.
…reaming re-permute

A target that differs from the canonical (fused..., left-free...,
right-free...) result layout cannot be folded into the batched tile op
(BatchedContractReduce must be perm-free); evaluate canonically (Summa
over a slab-replicated pmap) and re-permute to the target with a
streaming UnaryEvalImpl. Honors the implicit-permute contract: when the
consumer fuses the permutation into its own operation (transposed GEMM),
only the tile ordinals/trange are remapped and contents stay canonical.
Replaces the interleaved-target gate, enabling general products at inner
expression-tree nodes and non-canonical root targets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant